Re: How to deploy my java code which invokes Spark in Tomcat?

2014-12-21 Thread Akhil Das
If you are getting classNotFound, then you should use --jars option (of spark-submit) to submit those jars. Thanks Best Regards On Sun, Dec 21, 2014 at 10:01 AM, Tao Lu taolu2...@gmail.com wrote: Hi, Guys, I have some code which runs will using Spark-Submit command.

Re: spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-21 Thread Sean Owen
I'm only speculating, but I wonder if it was on purpose? would people ever build an app against the REPL? On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng pc...@uow.edu.au wrote: Everything else is there except spark-repl. Can someone check that out this weekend? -- View this message in

Re: spark-sql with join terribly slow.

2014-12-21 Thread Cheng Lian
Hari, Thanks for the details and sorry for the late reply. Currently Spark SQL doesn’t enable broadcast join optimization for left outer join, thus shuffles are required to perform this query. I made a quite artificial test to show the physical plan of your query: |== Physical Plan ==

Re: Querying registered RDD (AsTable) using JDBC

2014-12-21 Thread Cheng Lian
Evert - Thanks for the instructions, this is generally useful in other scenarios, but I think this isn’t what Shahab needs, because |saveAsTable| actually saves the contents of the SchemaRDD into Hive. Shahab - As Michael has answered in another thread, you may try

Re: Spark SQL DSL for joins?

2014-12-21 Thread Cheng Lian
On 12/17/14 1:43 PM, Jerry Raj wrote: Hi, I'm using the Scala DSL for Spark SQL, but I'm not able to do joins. I have two tables (backed by Parquet files) and I need to do a join across them using a common field (user_id). This works fine using standard SQL but not using the

Network file input cannot be recognized?

2014-12-21 Thread Shuai Zheng
Hi, I am running a code which takes a network file (not HDFS) location as input. But sc.textFile(networklocation\\README.md) can't recognize the network location start with as a valid location, because it only accept HDFS and local like file name format? Anyone has idea how can I use a

Re: SparkSQL 1.2.1-snapshot Left Join problem

2014-12-21 Thread Cheng Lian
Could you please file a JIRA together with the Git commit you're using? Thanks! On 12/18/14 2:32 AM, Hao Ren wrote: Hi, When running SparkSQL branch 1.2.1 on EC2 standalone cluster, the following query does not work: create table debug as select v1.* from t1 as v1 left join t2 as v2 on

Re: integrating long-running Spark jobs with Thriftserver

2014-12-21 Thread Cheng Lian
Hi Schweichler, This is an interesting and practical question. I'm not familiar with how Tableau works, but would like to share some thoughts. In general, big data analytics frameworks like MR and Spark tend to perform immutable functional transformations over immutable data. Whilst in your

Re: spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-21 Thread andy petrella
Actually yes, things like interactive notebooks f.i. On Sun Dec 21 2014 at 11:35:18 AM Sean Owen so...@cloudera.com wrote: I'm only speculating, but I wonder if it was on purpose? would people ever build an app against the REPL? On Sun, Dec 21, 2014 at 5:50 AM, Peng Cheng pc...@uow.edu.au

Find the file info of when load the data into RDD

2014-12-21 Thread Shuai Zheng
Hi All, When I try to load a folder into the RDDs, any way for me to find the input file name of particular partitions? So I can track partitions from which file. In the hadoop, I can find this information through the code: FileSplit fileSplit = (FileSplit) context.getInputSplit(); String

Re: Find the file info of when load the data into RDD

2014-12-21 Thread Shuai Zheng
I just found a possible answer: http://themodernlife.github.io/scala/spark/hadoop/hdfs/2014/09/28/spark-input-filename/ Will give a try on it. Although it is a bit troublesome, but if it works, will give what I want. Sorry for bother everyone here Regards, Shuai On Sun, Dec 21, 2014 at 4:43

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread Michael Armbrust
With JDBC you often need to load the class so it can register the driver at the beginning of your program. Usually this is something like: Class.forName(com.mysql.jdbc.Driver); On Fri, Dec 19, 2014 at 3:47 PM, durga durgak...@gmail.com wrote: Hi I am facing an issue with mysql jars with

Re: Find the file info of when load the data into RDD

2014-12-21 Thread Anwar Rizal
Yeah..., buat apparently mapPartitionsWithInputSplit thing is mapPartitionsWithInputSplit is tagged as DeveloperApi. Because of that, I'm not sure that it's a good idea to use the function. For this problem, I had to create a subclass HadoopRDD and use mapPartitions instead. Is there any reason

Issue with Parquet on Spark 1.2 and Amazon EMR

2014-12-21 Thread Adam Gilmore
Hi all, I've just launched a new Amazon EMR cluster and used the script at: s3://support.elasticmapreduce/spark/install-spark to install Spark (this script was upgraded to support 1.2). I know there are tools to launch a Spark cluster in EC2, but I want to use EMR. Everything installs fine;

locality sensitive hashing for spark

2014-12-21 Thread morr0723
I've pushed out an implementation of locality sensitive hashing for spark. LSH has a number of use cases, most prominent being if the features are not based in Euclidean space. Code, documentation, and small exemplar dataset is available on github: https://github.com/mrsqueeze/spark-hash Feel

Parquet schema changes

2014-12-21 Thread Adam Gilmore
Hi all, I understand that parquet allows for schema versioning automatically in the format; however, I'm not sure whether Spark supports this. I'm saving a SchemaRDD to a parquet file, registering it as a table, then doing an insertInto with a SchemaRDD with an extra column. The second

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
Hi All, I tried to make combined.jar in shell script . it is working when I am using spark-shell. But for the spark-submit it is same issue. Help is highly appreciated. Thanks -D -- View this message in context:

Re: java.sql.SQLException: No suitable driver found

2014-12-21 Thread durga
One more question. How would I submit additional jars to the spark-submit job. I used --jars option, it seems it is not working as explained earlier. Thanks for the help, -D -- View this message in context:

S3 files , Spark job hungsup

2014-12-21 Thread durga
Hi All, I am facing a strange issue sporadically. occasionally my spark job is hungup on reading s3 files. It is not throwing exception . or making some progress, it is just hungs up there. Is this a known issue , Please let me know how could I solve this issue. Thanks, -D -- View this

Re: locality sensitive hashing for spark

2014-12-21 Thread Nick Pentreath
Looks interesting thanks for sharing. Does it support cosine similarity ? I only saw jaccard mentioned from a quick glance. — Sent from Mailbox On Mon, Dec 22, 2014 at 4:12 AM, morr0723 michael.d@gmail.com wrote: I've pushed out an implementation of locality sensitive hashing for

Python:Streaming Question

2014-12-21 Thread Samarth Mailinglist
I’m trying to run the stateful network word count at https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/stateful_network_wordcount.py using the command: ./bin/spark-submit examples/src/main/python/streaming/stateful_network_wordcount.py localhost I am also

Question about TTL with TorrentBroadcastFactory in Spark-1.2.0

2014-12-21 Thread 顾亮亮
Hi All, I am facing a problem when using TTL with TorrentBroadcastFactory in Spark-1.2.0. My code is as follows: val conf = new SparkConf(). setAppName(TTL_Broadcast_vars). setMaster(local). //set(spark.broadcast.factory,