custom join using complex keys

2015-05-09 Thread Mathieu D
Hi folks, I need to join RDDs having composite keys like this : (K1, K2 ... Kn). The joining rule looks like this : * if left.K1 == right.K1, then we have a true equality, and all K2... Kn are also equal. * if left.K1 != right.K1 but left.K2 == right.K2, I have a partial equality, and I also

Spark streaming closes with Cassandra Conector

2015-05-09 Thread Sergio Jiménez Barrio
I am trying save some data in Cassandra in app with spark Streaming: Messages.foreachRDD { . . . CassandraRDD.saveToCassandra(test,test) } When I run, the app is closes when I recibe data or can't connect with Cassandra. Some idea? Thanks -- Atte. Sergio Jiménez

Does NullWritable can not be used in Spark?

2015-05-09 Thread donhoff_h
Hi, experts. I wrote a spark program to write a sequence file. I found if I used the NullWritable as the Key Class of the SequenceFile, the program reported exceptions. But if I used the BytesWritable or Text as the Key Class, the program did not report the exceptions. Does spark not

Re: Spark can not access jar from HDFS !!

2015-05-09 Thread Michael Armbrust
That code path is entirely delegated to hive. Does hive support this? You might try instead using sparkContext.addJar. On Sat, May 9, 2015 at 12:32 PM, Ravindra ravindra.baj...@gmail.com wrote: Hi All, I am trying to create custom udfs with hiveContext as given below - scala

Re: spark and binary files

2015-05-09 Thread ayan guha
Spark uses any inputformat you specify and number of splits=number of RDD partitions. You may want to take a deeper look at SparkContext.newAPIHadoopRDD to load your data. On Sat, May 9, 2015 at 4:48 PM, tog guillaume.all...@gmail.com wrote: Hi I havé an application that currently run using

Re: Submit Spark application in cluster mode and supervised

2015-05-09 Thread James King
Many Thanks Silvio, What I found out later is the if there was catastrophic failure and all the daemons fail at the same time before any fail-over takes place in this case when you bring back the cluster up the the job resumes only on the Master is was last running on before the failure.

Re: Duplicate entries in output of mllib column similarities

2015-05-09 Thread Richard Bolkey
Hi Reza, After a bit of digging, I had my previous issue a little bit wrong. We're not getting duplicate (i,j) entries, but we are getting transposed entries (i,j) and (j,i) with potentially different scores. We assumed the output would be a triangular matrix. Still, let me know if that's

Re: Spark SQL and Hive interoperability

2015-05-09 Thread barge.nilesh
hi, try your first method but create an external table in hive. like: hive -e CREATE *EXTERNAL* TABLE people (name STRING, age INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'; -- View this message in context:

How to implement an Evaluator for a ML pipeline?

2015-05-09 Thread Stefan H.
Hello everyone, I am stuck with the (experimental, I think) API for machine learning pipelines. I have a pipeline with just one estimator (ALS) and I want it to try different values for the regularization parameter. Therefore I need to supply an Evaluator that returns a value of type Double. I

Spark can not access jar from HDFS !!

2015-05-09 Thread Ravindra
Hi All, I am trying to create custom udfs with hiveContext as given below - scala hiveContext.sql (CREATE TEMPORARY FUNCTION sample_to_upper AS 'com.abc.api.udf.MyUpper' USING JAR 'hdfs:///users/ravindra/customUDF2.jar') I have put the udf jar in the hdfs at the path given above. The same

spark and binary files

2015-05-09 Thread tog
Hi I havé an application that currently run using MR. It currently starts extracting information from a proprietary binary file that is copied to HDFS. The application starts by creating business objects from information extracted from the binary files. Later those objects are used for further

RE: Using Pandas/Scikit Learning in Pyspark

2015-05-09 Thread Felix C
Your python job runs in a python process interacting with JVM. You do need matching python version and other dependent packages on the driver and all worker nodes if you run in YARN mode. --- Original Message --- From: Bin Wang binwang...@gmail.com Sent: May 8, 2015 9:56 PM To: