Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
I have 3 Spark Masters colocated with ZK's nodes and 2 Workers nodes. so my NameNodes are the same nodes as my spark master and DataNodes are the same Nodes as my Spark Workers. is that correct? How do I setup HDFS with zookeeper? On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra

Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
On Fri, Feb 3, 2017 at 10:27 PM, Mark Hamstra wrote: > yes > > On Fri, Feb 3, 2017 at 10:08 PM, kant kodali wrote: > >> can I use Spark Standalone with HDFS but no YARN? >> >> Thanks! >> > >

Re: can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread Mark Hamstra
yes On Fri, Feb 3, 2017 at 10:08 PM, kant kodali wrote: > can I use Spark Standalone with HDFS but no YARN? > > Thanks! >

can I use Spark Standalone with HDFS but no YARN

2017-02-03 Thread kant kodali
can I use Spark Standalone with HDFS but no YARN? Thanks!

Re: How do I dynamically add nodes to spark standalone cluster and be able to discover them?

2017-02-03 Thread kant kodali
sorry I should just do this ./start-slave.sh spark://x.x.x.x:7077,y.y.y.y:7077,z.z.z.z:7077 but what about export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z" ? Dont I need to have that on my worker node? Thanks! On Fri, Feb 3, 2017 at 4:57 PM, kant kodali wrote: > Hi,

Re: How do I dynamically add nodes to spark standalone cluster and be able to discover them?

2017-02-03 Thread kant kodali
Hi, How do I start a slave? just run start-slave.sh script? but then I don't understand the following. I put the following in spark-env.sh in the worker machine export SPARK_MASTER_HOST="x.x.x.x y.y.y.y z.z.z.z" but start-slave.sh doesn't seem to take SPARK_MASTER_HOST env variable. so I did

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil
I may have found my problem. We have a scala wrapper on top of spark-submit to run the shell command through scala. We were kind of eating the exit code from spark-submit in that wrapper. When I looked at what the actual exit code was stripping away the wrapper I got 1. So I think spark-submit is

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Hollin Wilkins
Hey Asher, A phone call may be the best to discuss all of this. But in short: 1. It is quite easy to add custom pipelines/models to MLeap. All of our out-of-the-box transformers can serve as a good example of how to do this. We are also putting together documentation on how to do this in our docs

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski
Hi, ➜ spark git:(master) ✗ ./bin/spark-submit whatever || echo $? Error: Cannot load main class from JAR file:/Users/jacek/dev/oss/spark/whatever Run with --help for usage help or --verbose for debug output 1 I see 1 and there are other cases for 1 too. Pozdrawiam, Jacek Laskowski

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Ali Gouta
Hello, +1, i have exactly the same issue. I need the exit code to make a decision on oozie executing actions. Spark-submit always returns 0 when catching the exception. From spark 1.5 to 1.6.x, i still have the same issue... It would be great to fix it or to know if there is some work around

Re: Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Jacek Laskowski
Hi, An interesting case. You don't use Spark resources whatsoever. Creating a SparkConf does not use YARN...yet. I think any run mode would have the same effect. So, although spark-submit could have returned exit code 1, the use case touches Spark very little. What version is that? Do you see

Re: sqlContext vs spark.

2017-02-03 Thread Jacek Laskowski
Hi, Yes. Forget about SQLContext. It's been merged into SparkSession as of Spark 2.0 (same about HiveContext). Long live SparkSession! :-) Jacek On 3 Feb 2017 7:48 p.m., "☼ R Nair (रविशंकर नायर)" < ravishankar.n...@gmail.com> wrote: All, In Spark 1.6.0, we used val jdbcDF =

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, I found a profile for Spark 2.11 and removed it. Now, it brings in 2.10. I ran some code and got further. Now, I get this error below when I do a “df.show”. java.lang.AbstractMethodError at org.apache.spark.Logging$class.log(Logging.scala:50) at

Re: HBase Spark

2017-02-03 Thread Asher Krim
You can see in the tree what's pulling in 2.11. Your option then will be to either shade them and add an explicit dependency on 2.10.5 in your pom. Alternatively, you can explore upgrading your project to 2.11 (which will require using a 2_11 build of spark) On Fri, Feb 3, 2017 at 2:03 PM,

Spark submit on yarn does not return with exit code 1 on exception

2017-02-03 Thread Shashank Mandil
Hi All, I wrote a test script which always throws an exception as below : object Test { def main(args: Array[String]) { try { val conf = new SparkConf() .setAppName("Test") throw new RuntimeException("Some Exception") println("all done!") }

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, You’re right. I don’t see anything but 2.11 being pulled in. Do you know where I can change this? Cheers, Ben > On Feb 3, 2017, at 10:50 AM, Asher Krim wrote: > > Sorry for my persistence, but did you actually run "mvn dependency:tree > -Dverbose=true"? And did

Re: NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job

2017-02-03 Thread Anastasios Zouzias
Hi there, Are you sure that the cluster nodes where the executors run have network connectivity to the elastic cluster? Speaking of which, why don't you use: https://github.com/elastic/elasticsearch-hadoop#apache-spark ? Cheers, Anastasios On Fri, Feb 3, 2017 at 7:10 PM, Dmitry Goldenberg

Re: HBase Spark

2017-02-03 Thread Asher Krim
Sorry for my persistence, but did you actually run "mvn dependency:tree -Dverbose=true"? And did you see only scala 2.10.5 being pulled in? On Fri, Feb 3, 2017 at 12:33 PM, Benjamin Kim wrote: > Asher, > > It’s still the same. Do you have any other ideas? > > Cheers, > Ben >

sqlContext vs spark.

2017-02-03 Thread रविशंकर नायर
All, In Spark 1.6.0, we used val jdbcDF = sqlContext.read.format(-) for creating a data frame through hsbc. In Spark 2.1.x, we have seen this is val jdbcDF = *spark*.read.format(-) Does that mean we should not be using sqlContext going forward? Also, we see that sqlContext is not auto

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Asher Krim
I have a bunch of questions for you Hollin: How easy is it to add support for custom pipelines/models? Are Spark mllib models supported? We currently run spark in local mode in an api service. It's not super terrible, but performance is a constant struggle. Have you benchmarked any performance

Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Thanks Fernando. But I need to have only 1 row for a given user, date with very low latency. So none of your options work for me. On Fri, Feb 3, 2017 at 10:34 AM, Fernando Avalos wrote: > Hi Shyla, > > Maybe I am wrong, but I can see two options here. > > 1.- Do some

Re: saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Hi All, I wanted to add more info .. The first column is the user and the third is the period. and my key is (userid, date) For a given user and date combination I want to see only 1 row. My problem is that PT0H10M0S is overwritten by PT0H9M30S, even though the order of the rows in the RDD is

NoNodeAvailableException (None of the configured nodes are available) error when trying to push data to Elastic from a Spark job

2017-02-03 Thread Dmitry Goldenberg
Hi, Any reason why we might be getting this error? The code seems to work fine in the non-distributed mode but the same code when run from a Spark job is not able to get to Elastic. Spark version: 2.0.1 built for Hadoop 2.4, Scala 2.11 Elastic version: 2.3.1 I've verified the Elastic hosts and

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
Asher, It’s still the same. Do you have any other ideas? Cheers, Ben > On Feb 3, 2017, at 8:16 AM, Asher Krim wrote: > > Did you check the actual maven dep tree? Something might be pulling in a > different version. Also, if you're seeing this locally, you might want to >

Re: HBase Spark

2017-02-03 Thread Benjamin Kim
I'll clean up any .m2 or .ivy directories. And try again. I ran this on our lab cluster for testing. Cheers, Ben On Fri, Feb 3, 2017 at 8:16 AM Asher Krim wrote: > Did you check the actual maven dep tree? Something might be pulling in a > different version. Also, if you're

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Hollin Wilkins
Hey Aseem, We have built pipelines that execute several string indexers, one hot encoders, scaling, and a random forest or linear regression at the end. Execution time for the linear regression was on the order of 11 microseconds, a bit longer for random forest. This can be further optimized by

Re: HBase Spark

2017-02-03 Thread Asher Krim
Did you check the actual maven dep tree? Something might be pulling in a different version. Also, if you're seeing this locally, you might want to check which version of the scala sdk your IDE is using Asher Krim Senior Software Engineer On Thu, Feb 2, 2017 at 5:43 PM, Benjamin Kim

Is DoubleWritable and DoubleObjectInspector doing the same thing in Hive UDF?

2017-02-03 Thread Alex
Hi, can You guys tell me if below peice of two codes are returning the same thing? (((DoubleObjectInspector) ins2).get(obj)); and (DoubleWritable)obj).get(); from below two codes code 1) public Object get(Object name) { int pos = getPos((String)name); if(pos<0) return null;

problem with the method JavaDStream.foreachRDD() SparkStreaming

2017-02-03 Thread Hamza HACHANI
Hi, I'm new to SparkStreaming. I'm using the versions 2.10 for spark core and spark streaming My issue is that when i try to use JavaPairDStream.foreachRDD : test.foreachRDD(new Function,Void>() { public Void call(JavaPairRDD rdd) {

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Aseem Bansal
Does this support Java 7? On Fri, Feb 3, 2017 at 5:30 PM, Aseem Bansal wrote: > Is computational time for predictions on the order of few milliseconds (< > 10 ms) like the old mllib library? > > On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins wrote: >

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-03 Thread Aseem Bansal
Is computational time for predictions on the order of few milliseconds (< 10 ms) like the old mllib library? On Thu, Feb 2, 2017 at 10:12 PM, Hollin Wilkins wrote: > Hey everyone, > > > Some of you may have seen Mikhail and I talk at Spark/Hadoop Summits about > MLeap and how

Bipartite projection with Graphx

2017-02-03 Thread balaji9058
Hi, Is possible Bipartite projection with Graphx Rdd1 #id name 1 x1 2 x2 3 x3 4 x4 5 x5 6 x6 7 x7 8 x8 Rdd2 #id name 10001 y1 10002 y2 10003 y3 10004 y4 10005 y5 10006 y6 EdgeList #src id Dest id 1 10001 1 10002 2

Re: Suprised!!!!!Spark-shell showing inconsistent results

2017-02-03 Thread Alex
Hi Team, Actually I figured out something .. While Hive Java UDF executed on hive it is giving output with 10 decimal precision but in spark same udf is giving results rounded off to 6 decimal precision... How do I stop that? Its the same java udf jar files used in both hive and spark.. [image:

saveToCassandra issue. Please help

2017-02-03 Thread shyla deshpande
Hello All, This is the content of my RDD which I am saving to Cassandra table. But looks like the 2nd row is written first and then the first row overwrites it. So I end up with bad output. (494bce4f393b474980290b8d1b6ebef9, 2017-02-01, PT0H9M30S, WEDNESDAY) (494bce4f393b474980290b8d1b6ebef9,