How to randomise data accross all partitions and merge them into one.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-randomise-data-on-spark-tp2.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
When can we expect the latest kafka and scala 2.11 support in spark
streaming?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Version-Update-0-8-2-status-tp21573.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I'm also facing the same issue.
is this a bug?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-1-slow-working-Spark-1-2-fast-freezing-tp21278p21283.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Where can I find a good documentation on sql catalyst?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/is-there-documentation-on-spark-sql-catalyst-tp21232.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
+1, I too need to know.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Does-Spark-automatically-run-different-stages-concurrently-when-possible-tp21075p21233.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Hi, john and david
I tried this to run them concurrently List(RDD1,RDD2,.).par.foreach{
rdd=
rdd.collect().foreach(println)
}
this was able to successfully register the task but the parallelism of the
stages is limited it was able run 4 of them some time and only one of them
some time
Hi,
Im facing this error on spark ec2 cluster when a job is submitted its says
that native hadoop libraries are not found I have checked spark-env.sh and
all the folders in the path but unable to find the problem even though the
folder are containing. are there any performance drawbacks if we use
How to make the spark ec2 script to install hive and spark sql on ec2 when I
run the spark ec2 script and go to bin and run ./spark-sql and execute query
I'm getting connection refused on master:9000 what else has to be configured
for this?
--
View this message in context:
So what should be the value for --hadoop-major-version the follwing hadoop
versions
Hadoop1.x is 1
CDH4
Hadoop2.3
Hadoop2.4
MapR 3.x
MapR 4.x
--
View this message in context:
Hi, I m facing serious issues with spark application not recognizing the
classes in uber jar some times it recognizes some time its does not. even
adding external jars using setJars is not helping sometimes is any one else
facing similar issue? Im using the latest 1.2.0 version.
--
View this
this out put from std err will help?
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/12/26 10:13:44 INFO CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
14/12/26 10:13:44 WARN NativeCodeLoader: Unable to load native-hadoop
library
https://github.com/apache/spark/blob/84d79ee9ec47465269f7b0a7971176da93c96f3f/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
Doesn't look like spark sql support nested complex types right now
--
View this message in context:
Exactly that seems to be the problem will have to wait for the next release
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-insert-complex-types-like-map-string-map-string-int-in-spark-sql-tp19603p19734.html
Sent from the Apache Spark User List
Thanks for the reply Micheal here is the stack trace
org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in
stage 0.0 failed 1 times, most recent failure: Lost task 3.0 in stage 0.0
(TID 3, localhost): scala.MatchError: MapType(StringType,StringType,true)
(of class
Hi,
I am trying to insert particular set of data from rdd to a hive table I
have Map[String,Map[String,Int]] in scala which I want to insert into the
table of mapstring,maplt;string,int I was able to create the table but
while inserting it says scala.MatchError:
Hi,
I'm trying to get hold of use spark context from hive context or
streamingcontext. I have 2 pieces of codes one in core spark one in spark
streaming. plain spark with hive which gives me context. Spark streaming
code with hive which prints null. plz help me figure out how to make this
code
http://www.cs.berkeley.edu/~matei/papers/2013/sosp_spark_streaming.pdf
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-Storm-tp9118p17530.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
mean by
extract could you direct me to api or code sample.
thanks and regards,
critikaled.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-to-Multiple-Tables-SparkSQL-tp16807p17536.html
Sent from the Apache Spark User List mailing list archive
Hi Ron,
what ever api you have in scala you can possibly use it form java. scala is
inter-operable with java and vice versa. scala being both object oriented
and functional will make your job easier on jvm and it is more consise than
java. Take it as an opportunity and start learning scala ;).
Hi I have a rdd which I want to register as multiple tables based on key
val context = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(context)
import sqlContext.createSchemaRDD
case class KV(key:String,id:String,value:String)
val logsRDD =
Hi, Going through spark mllib doc I have noticed that it supports multiclass
classification can any body help me in implementing multilabel
classification on spark like in Mulan
http://mulan.sourceforge.net/index.html and Meka
http://meka.sourceforge.net/ libraries.
--
View this message
21 matches
Mail list logo