Local spark jars not being detected

2015-06-20 Thread Ritesh Kumar Singh
Hi, I'm using IntelliJ ide for my spark project. I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the compiled jar installed in my m2 folder : ~/.m2/repository/org/apache/spark/spark-core_2.11/1.3.0/spark-core_2.11-1.3.0.jar But when I add this dependency in my pom file for the

Re: Abount Jobs UI in yarn-client mode

2015-06-20 Thread Steve Loughran
On 19 Jun 2015, at 16:48, Sea 261810...@qq.commailto:261810...@qq.com wrote: Hi, all: I run spark on yarn, I want to see the Jobs UI http://ip:4040/, but it redirect to http://${yarn.ip}/proxy/application_1428110196022_924324/ which can not be found. Why? Anyone can help? whenever you point

Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla
How to know that In stream Processing over the cluster of 8 machines all the machines/woker nodes are being used (my cluster have 8 slaves ) . -- Thanks Regards, Anshu Shukla

Re: Web UI vs History Server Bugs

2015-06-20 Thread Steve Loughran
On 17 Jun 2015, at 19:10, jcai jonathon@yale.edu wrote: Hi, I am running this on Spark stand-alone mode. I find that when I examine the web UI, a couple bugs arise: 1. There is a discrepancy between the number denoting the duration of the application when I run the history server

Re: Local spark jars not being detected

2015-06-20 Thread Akhil Das
Not sure, but try removing the provided or create a lib directory in the project home and bring that jar over there. On 20 Jun 2015 18:08, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, I'm using IntelliJ ide for my spark project. I've compiled spark 1.3.0 for scala 2.11.4 and

Spark SQL JDBC Source data skew

2015-06-20 Thread Sathish Kumaran Vairavelu
Hi, In Spark SQL JDBC data source there is an option to specify upper/lower bound and num of partitions. How Spark handles data distribution, if we do not give the upper/lower/num of parititons ? Will all data from the external data source skewed up in one executor? In many situations, we do not

Re: Velox Model Server

2015-06-20 Thread Charles Earl
Is velox NOT open source? On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote: Hi, The demo of end-to-end ML pipeline including the model server component at Spark Summit was really cool. I was wondering if the Model Server component is based upon Velox or it uses a

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-20 Thread Mohammed Guller
It is a simple Play-based web application. It exposes an URI for submitting a SQL query. It then executes that query using CassandraSQLContext provided by Spark Cassandra Connector. Since it is web-based, I added an authentication and authorization layer to make sure that only users with the

Spark 1.4 History Server - HDP 2.2

2015-06-20 Thread Ashish Soni
Can any one help i am getting below error when i try to start the History Server I do not see any org.apache.spark.deploy.yarn.history.pakage inside the assembly jar not sure how to get that java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.history.YarnHistoryProvider Thanks,

Re: createDirectStream and Stats

2015-06-20 Thread Silvio Fiorito
Are you sure you were using all 100 executors even with the receiver model? Because in receiver mode, the number of partitions is dependent on the batch duration and block interval. It may not necessarily map directly to the number of executors in your app unless you've adjusted the block

Fwd: Verifying number of workers in Spark Streaming

2015-06-20 Thread anshu shukla
Any suggestions please ..!! How to know that In stream Processing over the cluster of 8 machines all the machines/woker nodes are being used (my cluster have 8 slaves ) . I am submitting job from master itself over the ec-2 cluster crated by the ec-2 scripts available with spark. But i am

Re: Local spark jars not being detected

2015-06-20 Thread Ritesh Kumar Singh
Yes, finally solved. It was there in front of my eyes all time. Thanks a lot Pete.

Re: Velox Model Server

2015-06-20 Thread Sandy Ryza
Hi Debasish, The Oryx project (https://github.com/cloudera/oryx), which is Apache 2 licensed, contains a model server that can serve models built with MLlib. -Sandy On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl charles.ce...@gmail.com wrote: Is velox NOT open source? On Saturday, June 20,

How to get the ALS reconstruction error

2015-06-20 Thread afarahat
Hello; I am fitting ALS models and would like to get an initial idea of the number of factors.I wan tot use the reconstruction error on train data as a measure. Does the API expose the reconstruction error ? Thanks Ayman -- View this message in context:

RE: Code review - Spark SQL command-line client for Cassandra

2015-06-20 Thread shahid ashraf
Hi Mohammad Can you provide more info about the Service u developed On Jun 20, 2015 7:59 AM, Mohammed Guller moham...@glassbeam.com wrote: Hi Matthew, It looks fine to me. I have built a similar service that allows a user to submit a query from a browser and returns the result in JSON

RE: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Andrew Lee
Hi Roberto, I'm not an EMR person, but it looks like option -h is deploying the necessary dataneucleus JARs for you.The req for HiveContext is the hive-site.xml and dataneucleus JARs. As long as these 2 are there, and Spark is compiled with -Phive, it should work. spark-shell runs in

Re: Local spark jars not being detected

2015-06-20 Thread Pete Zybrick
It looks like you are using parens instead of curly braces on scala.version On Jun 20, 2015, at 8:38 AM, Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: Hi, I'm using IntelliJ ide for my spark project. I've compiled spark 1.3.0 for scala 2.11.4 and here's the one of the

Velox Model Server

2015-06-20 Thread Debasish Das
Hi, The demo of end-to-end ML pipeline including the model server component at Spark Summit was really cool. I was wondering if the Model Server component is based upon Velox or it uses a completely different architecture. https://github.com/amplab/velox-modelserver We are looking for an open

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Bozeman, Christopher
We worked it out. There was multiple items (like location of remote metastore and db user auth) to make HiveContext happy in yarn-cluster mode. For reference https://github.com/awslabs/emr-bootstrap-actions/blob/master/spark/examples/using-hivecontext-yarn-cluster.md -Christopher Bozeman On

Re: Velox Model Server

2015-06-20 Thread Donald Szeto
Mind if I ask what 1.3/1.4 ML features that you are looking for? On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote: After getting used to Scala, writing Java is too much work :-) I am looking for scala based project that's using netty at its core (spray is one example).

Re: Velox Model Server

2015-06-20 Thread Debasish Das
Integration of model server with ML pipeline API. On Sat, Jun 20, 2015 at 12:25 PM, Donald Szeto don...@prediction.io wrote: Mind if I ask what 1.3/1.4 ML features that you are looking for? On Saturday, June 20, 2015, Debasish Das debasish.da...@gmail.com wrote: After getting used to

Re: Abount Jobs UI in yarn-client mode

2015-06-20 Thread Gavin Yue
I got the same problem when I upgrade from 1.3.1 to 1.4. The same Conf has been used, 1.3 works, but 1.4UI does not work. So I added the property nameyarn.resourcemanager.webapp.address/name value:8088/value /property property nameyarn.resourcemanager.hostname/name

Re: [Spark 1.3.1 on YARN on EMR] Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient

2015-06-20 Thread Roberto Coluccio
I confirm, Christopher was very kind helping me out here. The solution presented in the linked doc worked perfectly. IMO it should be linked in the official Spark documentation. Thanks again, Roberto On 20 Jun 2015, at 19:25, Bozeman, Christopher bozem...@amazon.com wrote: We worked it

Re: Velox Model Server

2015-06-20 Thread Sandy Ryza
Oops, that link was for Oryx 1. Here's the repo for Oryx 2: https://github.com/OryxProject/oryx On Sat, Jun 20, 2015 at 10:20 AM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Debasish, The Oryx project (https://github.com/cloudera/oryx), which is Apache 2 licensed, contains a model server

Re: Velox Model Server

2015-06-20 Thread Debasish Das
After getting used to Scala, writing Java is too much work :-) I am looking for scala based project that's using netty at its core (spray is one example). prediction.io is an option but that also looks quite complicated and not using all the ML features that got added in 1.3/1.4 Velox built on

Re: Submitting Spark Applications using Spark Submit

2015-06-20 Thread Raghav Shankar
Hey Andrew, I tried the following approach: I modified my Spark build on my local machine. I did downloaded the Spark 1.4.0 src code and then made a change to ResultTask.scala( I made a simple change to see if it work. I added a print statement). Now, I built spark using mvn

Re: Grouping elements in a RDD

2015-06-20 Thread Corey Nolet
If you use rdd.mapPartitions(), you'll be able to get a hold of the iterators for each partiton. Then you should be able to do iterator.grouped(size) on each of the partitions. I think it may mean you have 1 element at the end of each partition that may have less than size elements. If that's okay

Re: Serial batching with Spark Streaming

2015-06-20 Thread Tathagata Das
No it does not. By default, only after all the retries etc related to batch X is done, then batch X+1 will be started. Yes, one RDD per batch per DStream. However, the RDD could be a union of multiple RDDs (e.g. RDDs generated by windowed DStream, or unioned DStream). TD On Fri, Jun 19, 2015 at

Load slf4j from the job assembly instead of from the Spark jar

2015-06-20 Thread Mario Pastorelli
Hi everyone, I'm trying to use the logstash-logback-encoder https://github.com/logstash/logstash-logback-encoder in my spark jobs but I'm having some problems with the Spark classloader. The logstash-logback-encoder uses a special version of the slf4j BasicMarker

Re: Serial batching with Spark Streaming

2015-06-20 Thread Michal Čizmazia
Thank you very much for confirmation. On 20 June 2015 at 17:21, Tathagata Das t...@databricks.com wrote: No it does not. By default, only after all the retries etc related to batch X is done, then batch X+1 will be started. Yes, one RDD per batch per DStream. However, the RDD could be a

Grouping elements in a RDD

2015-06-20 Thread Brandon White
How would you do a .grouped(10) on a RDD, is it possible? Here is an example for a Scala list scala List(1,2,3,4).grouped(2).toList res1: List[List[Int]] = List(List(1, 2), List(3, 4)) Would like to group n elements.

How could output the StreamingLinearRegressionWithSGD prediction result?

2015-06-20 Thread Gavin Yue
Hey, I am testing the StreamingLinearRegressionWithSGD following the tutorial. It works, but I could not output the prediction results. I tried the saveAsTextFile, but it only output _SUCCESS to the folder. I am trying to check the prediction results and use BinaryClassificationMetrics to get

Task Serialization Error on DataFrame.foreachPartition

2015-06-20 Thread Nishant Patel
Hi, I am loading data from Hive table to Hbase after doing some manipulation. I am getting error as 'Task not Serializable'. My code is as below. public class HiveToHbaseLoader implements Serializable { public static void main(String[] args) throws Exception { String