How to randomise data on spark

2015-03-25 Thread critikaled
How to randomise data accross all partitions and merge them into one. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-randomise-data-on-spark-tp2.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Kafka Version Update 0.8.2 status?

2015-02-10 Thread critikaled
When can we expect the latest kafka and scala 2.11 support in spark streaming? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Version-Update-0-8-2-status-tp21573.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark 1.1 (slow, working), Spark 1.2 (fast, freezing)

2015-01-21 Thread critikaled
I'm also facing the same issue. is this a bug? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-slow-working-Spark-1-2-fast-freezing-tp21278p21283.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

is there documentation on spark sql catalyst?

2015-01-19 Thread critikaled
Where can I find a good documentation on sql catalyst? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/is-there-documentation-on-spark-sql-catalyst-tp21232.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread critikaled
+1, I too need to know. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-automatically-run-different-stages-concurrently-when-possible-tp21075p21233.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Does Spark automatically run different stages concurrently when possible?

2015-01-19 Thread critikaled
Hi, john and david I tried this to run them concurrently List(RDD1,RDD2,.).par.foreach{ rdd= rdd.collect().foreach(println) } this was able to successfully register the task but the parallelism of the stages is limited it was able run 4 of them some time and only one of them some time

Spark 1.2.0 ec2 launch script hadoop native libraries not found warning

2015-01-08 Thread critikaled
Hi, Im facing this error on spark ec2 cluster when a job is submitted its says that native hadoop libraries are not found I have checked spark-env.sh and all the folders in the path but unable to find the problem even though the folder are containing. are there any performance drawbacks if we use

How to set up spark sql on ec2

2014-12-29 Thread critikaled
How to make the spark ec2 script to install hive and spark sql on ec2 when I run the spark ec2 script and go to bin and run ./spark-sql and execute query I'm getting connection refused on master:9000 what else has to be configured for this? -- View this message in context:

What are all the Hadoop Major Versions in spark-ec2 script?

2014-12-29 Thread critikaled
So what should be the value for --hadoop-major-version the follwing hadoop versions Hadoop1.x is 1 CDH4 Hadoop2.3 Hadoop2.4 MapR 3.x MapR 4.x -- View this message in context:

Serious issues with class not found exceptions of classes in uber jar

2014-12-26 Thread critikaled
Hi, I m facing serious issues with spark application not recognizing the classes in uber jar some times it recognizes some time its does not. even adding external jars using setJars is not helping sometimes is any one else facing similar issue? Im using the latest 1.2.0 version. -- View this

Re: Serious issues with class not found exceptions of classes in uber jar

2014-12-26 Thread critikaled
this out put from std err will help? Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 14/12/26 10:13:44 INFO CoarseGrainedExecutorBackend: Registered signal handlers for [TERM, HUP, INT] 14/12/26 10:13:44 WARN NativeCodeLoader: Unable to load native-hadoop library

Re: How to insert complex types like mapstring,mapstring,int in spark sql

2014-11-25 Thread critikaled
https://github.com/apache/spark/blob/84d79ee9ec47465269f7b0a7971176da93c96f3f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala Doesn't look like spark sql support nested complex types right now -- View this message in context:

Re: How to insert complex types like mapstring,mapstring,int in spark sql

2014-11-25 Thread critikaled
Exactly that seems to be the problem will have to wait for the next release -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-complex-types-like-map-string-map-string-int-in-spark-sql-tp19603p19734.html Sent from the Apache Spark User List

Re: How to insert complex types like mapstring,mapstring,int in spark sql

2014-11-24 Thread critikaled
Thanks for the reply Micheal here is the stack trace org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0 (TID 3, localhost): scala.MatchError: MapType(StringType,StringType,true) (of class

How to insert complex types like mapstring,mapstring,int in spark sql

2014-11-23 Thread critikaled
Hi, I am trying to insert particular set of data from rdd to a hive table I have Map[String,Map[String,Int]] in scala which I want to insert into the table of mapstring,maplt;string,int I was able to create the table but while inserting it says scala.MatchError:

How to retrive spark context when hiveContext is used in sparkstreaming

2014-10-29 Thread critikaled
Hi, I'm trying to get hold of use spark context from hive context or streamingcontext. I have 2 pieces of codes one in core spark one in spark streaming. plain spark with hive which gives me context. Spark streaming code with hive which prints null. plz help me figure out how to make this code

Re: Spark Streaming and Storm

2014-10-28 Thread critikaled
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-Storm-tp9118p17530.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: RDD to Multiple Tables SparkSQL

2014-10-28 Thread critikaled
mean by extract could you direct me to api or code sample. thanks and regards, critikaled. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807p17536.html Sent from the Apache Spark User List mailing list archive

Re: Is Spark in Java a bad idea?

2014-10-28 Thread critikaled
Hi Ron, what ever api you have in scala you can possibly use it form java. scala is inter-operable with java and vice versa. scala being both object oriented and functional will make your job easier on jvm and it is more consise than java. Take it as an opportunity and start learning scala ;).

RDD to Multiple Tables SparkSQL

2014-10-20 Thread critikaled
Hi I have a rdd which I want to register as multiple tables based on key val context = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(context) import sqlContext.createSchemaRDD case class KV(key:String,id:String,value:String) val logsRDD =

any good library to implement multilabel classification on spark?

2014-10-03 Thread critikaled
Hi, Going through spark mllib doc I have noticed that it supports multiclass classification can any body help me in implementing multilabel classification on spark like in Mulan http://mulan.sourceforge.net/index.html and Meka http://meka.sourceforge.net/ libraries. -- View this message