Re: Hadoop 2.6 compatibility?

2014-12-20 Thread Sean Owen
To clarify clarify, Ted's got the right formula. You would use -Phadoop-2.4 to set up the build configuration, and then customize -Dhadoop.version= further to the 2.4+ version you want. On Sat, Dec 20, 2014 at 12:35 AM, Denny Lee denny.g@gmail.com wrote: To clarify, there isn't a Hadoop 2.6

Re: Including data nucleus tools

2014-12-20 Thread Jakub Dubovsky
Hi DB,   I cherry-picked the commit into branch-1.2 and it solved the problem. It solves the problem but has some bits and pieces around which was not finalized thus reverted beeing late in release process.   Jakub -- Just out of my curiosity. Do you manually apply this patch and see if

Are failures normal / to be expected on an AWS cluster?

2014-12-20 Thread Joe Wass
I have a Spark job running on about 300 GB of log files, on Amazon EC2, with 10 x Large instances (each with 600 GB disk). The job hasn't yet completed. So far, 18 stages have completed (2 of which have retries) and 3 stages have failed. In each failed stage there are ~160 successful tasks, but

v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
First, thanks for the efforts and contribution to such a useful software stack! Spark is great! I have been using the git tags for v1.2.0-rc1 and v1.2.0-rc2 built as follows: ./make-distribution.sh -Dhadoop.version=2.5.0-cdh5.2.0 -Dyarn.version=2.5.0-cdh5.2.0 -Phadoop-2.4 -Phive -Pyarn

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Mark Hamstra
This makes no sense. There is no difference between v1.2.0-rc2 and v1.2.0: https://github.com/apache/spark/compare/v1.2.0-rc2...v1.2.0 On Sat, Dec 20, 2014 at 12:44 PM, Matt Mead m...@matthewcmead.com wrote: First, thanks for the efforts and contribution to such a useful software stack!

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
Bizarre. I originally cloned from and have been pulling from https://github.com/apache/spark, and my repo shows the following: user@host:~/development/spark$ git diff v1.2.0-rc2..v1.2.0 | wc -l 1898 If I pull a fresh clone, I get this: user@host:~$ git clone https://github.com/apache/spark

Re: v1.2.0 (re?)introduces Wrong FS behavior in thriftserver

2014-12-20 Thread Matt Mead
That seemed to correct the issue. Thanks for pointing out the lack of diffs between v1.2.0-rc2 and v1.2.0 -- I'm not sure how my git repo ended up not matching its origin. -matt On Sat, Dec 20, 2014 at 4:25 PM, Matt Mead m...@matthewcmead.com wrote: Bizarre. I originally cloned from and

Re: EC2 VPC script

2014-12-20 Thread Nicholas Chammas
What version of the script are you running? What did you see in the EC2 web console when this happened? Sometimes instances just don't come up in a reasonable amount of time and you have to kill and restart the process. Does this always happen, or was it just once? Nick On Thu, Dec 18, 2014 at

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Nicholas Chammas
Is the operation slow every time or does it run normally if you repeat the operation within the same app? Nick On Thu, Dec 18, 2014 at 8:56 AM, Jon Chase jon.ch...@gmail.com wrote: I'm running a very simple Spark application that downloads files from S3, does a bit of mapping, then uploads

Re: Downloads from S3 exceedingly slow when running on spark-ec2

2014-12-20 Thread Paul Brown
I would suggest checking out disk IO on the nodes in your cluster and then reading up on the limiting behaviors that accompany different kinds of EC2 storage. Depending on how things are configured for your nodes, you may have a local storage configuration that provides bursty IOPS where you get

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B subac...@gmail.com wrote: Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks

Re: SchemaRDD to Hbase

2014-12-20 Thread Alex Kamil
I'm using JDBCRDD https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.rdd.JdbcRDD + Hbase JDBC driver http://phoenix.apache.org/+ schemaRDD https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.sql.SchemaRDD make sure to use spark 1.2 On Sat, Dec 20,

How to deploy my java code which invokes Spark in Tomcat?

2014-12-20 Thread Tao Lu
Hi, Guys, I have some code which runs will using Spark-Submit command. $SPARK_HOME/bin/spark-submit --class com.myorg.service.SparkService ./Search.jar How can I deploy it to Tomcat? If I simply deploy the jar file, I will get ClassNotFound error. Thanks!

spark-repl_1.2.0 was not uploaded to central maven repository.

2014-12-20 Thread Peng Cheng
Everything else is there except spark-repl. Can someone check that out this weekend? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-repl-1-2-0-was-not-uploaded-to-central-maven-repository-tp20799.html Sent from the Apache Spark User List mailing list

Re: Spark on Tachyon

2014-12-20 Thread Peng Cheng
IMHO: cache doesn't provide redundancy, and its in the same jvm, so its much faster. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Tachyon-tp1463p20800.html Sent from the Apache Spark User List mailing list archive at Nabble.com.