streaming of binary files in PySpark

2017-05-22 Thread Yogesh Vyas
Hi, I want to use Spark Streaming to read the binary files from HDFS. In the documentation, it is mentioned to use binaryRecordStream(directory, recordLength). But I didn't understand what does the record length means?? Does it means the size of the binary file or something else? Regards,

pandas DF Dstream to Spark DF

2017-04-09 Thread Yogesh Vyas
Hi, I am writing a pyspark streaming job in which i am returning a pandas data frame as DStream. Now I wanted to save this DStream dataframe to parquet file. How to do that? I am trying to convert it to spark data frame but I am getting multiple errors. Please suggest me how to do that.

pandas DF DStream to Spark dataframe

2017-04-09 Thread Yogesh Vyas
Hi, I am writing a pyspark streaming job in which i am returning a pandas data frame as DStream. Now I wanted to save this DStream dataframe to parquet file. How to do that? I am trying to convert it to spark data frame but I am getting multiple errors. Please suggest me how to do that.

use UTF-16 decode in pyspark streaming

2017-04-06 Thread Yogesh Vyas
Hi, I am trying to decode the binary data using UTF-16 decode in Kafka consumer using spark streaming. But it is giving error: TypeError: 'str' object is not callable I am doing it in following way: kvs = KafkaUtils.createStream(ssc, zkQuorum, "spark-streaming-consumer", {topic:

reading binary file in spark-kafka streaming

2017-04-05 Thread Yogesh Vyas
Hi, I am having a binary file which I try to read in Kafka Producer and send to message queue. This I read in the Spark-Kafka consumer as streaming job. But it is giving me following error: UnicodeDecodeError: 'utf8' codec can't decode byte 0xa9 in position 112: invalid start byte Can anyone

read binary file in PySpark

2017-04-02 Thread Yogesh Vyas
Hi, I am trying to read binary file in PySpark using API binaryRecords(path, recordLength), but it is giving all values as ['\x00', '\x00', '\x00', '\x00',]. But when I am trying to read the same file using binaryFiles(0, it is giving me correct rdd, but in form of key-value pair. The value

Disable logger in SparkR

2016-08-22 Thread Yogesh Vyas
Hi, Is there any way of disabling the logging on console in SparkR ? Regards, Yogesh - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

UDF in SparkR

2016-08-17 Thread Yogesh Vyas
Hi, Is there is any way of using UDF in SparkR ? Regards, Yogesh - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

XLConnect in SparkR

2016-07-20 Thread Yogesh Vyas
Hi, I am trying to load and read excel sheets from HDFS in sparkR using XLConnect package. Can anyone help me in finding out how to read xls files from HDFS in sparkR ? Regards, Yogesh - To unsubscribe e-mail:

Handle empty kafka in Spark Streaming

2016-06-15 Thread Yogesh Vyas
Hi, Does anyone knows how to handle empty Kafka while Spark Streaming job is running ? Regards, Yogesh - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Handling Empty RDD

2016-05-22 Thread Yogesh Vyas
hih...@gmail.com> wrote: > You mean when rdd.isEmpty() returned false, saveAsTextFile still produced > empty file ? > > Can you show code snippet that demonstrates this ? > > Cheers > > On Sun, May 22, 2016 at 5:17 AM, Yogesh Vyas <informy...@gmail.com> wrote: &

Handling Empty RDD

2016-05-22 Thread Yogesh Vyas
Hi, I am reading files using textFileStream, performing some action onto it and then saving it to HDFS using saveAsTextFile. But whenever there is no file to read, Spark will write and empty RDD( [] ) to HDFS. So, how to handle the empty RDD. I checked rdd.isEmpty() and rdd.count>0, but both of

Filter out the elements from xml file in Spark

2016-05-19 Thread Yogesh Vyas
Hi, I had xml files which I am reading through textFileStream, and then filtering out the required elements using traditional conditions and loops. I would like to know if there is any specific packages or functions provided in spark to perform operations on RDD of xml? Regards, Yogesh

File not found exception while reading from folder using textFileStream

2016-05-18 Thread Yogesh Vyas
Hi, I am trying to read the files in a streaming way using Spark Streaming. For this I am copying files from my local folder to the source folder from where spark reads the file. After reading and printing some of the files, it gives the following error: Caused by:

Save DataFrame to Hive Table

2016-02-29 Thread Yogesh Vyas
Hi, I have created a DataFrame in Spark, now I want to save it directly into the hive table. How to do it.? I have created the hive table using following hiveContext: HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); hiveContext.sql("CREATE TABLE IF NOT

Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application

2016-02-15 Thread Yogesh Vyas
Hi, I am trying to run a KMeansStreaming from the Java application, but it gives the following error: "Getting java.lang.IllegalArgumentException: requirement failed while calling Sparks MLLIB StreamingKMeans from java application" Below is my code: JavaDStream v = trainingData.map(new

Visualization of KMeans cluster in Spark

2016-01-28 Thread Yogesh Vyas
Hi, Is there any way to visualizing the KMeans clusters in spark? Can we connect Plotly with Apache Spark in Java? Thanks, Yogesh - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail:

NoSuchMethodError

2015-11-15 Thread Yogesh Vyas
Hi, While I am trying to read a json file using SQLContext, i get the following error: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.(Lorg/apache/spark/api/java/JavaSparkContext;)V at com.honeywell.test.testhive.HiveSpark.main(HiveSpark.java:15)

Re: NoSuchMethodError

2015-11-15 Thread Yogesh Vyas
; DataFrame df = sqlContext.read().json(pathToJSONFile); df.show(); On Mon, Nov 16, 2015 at 12:48 PM, Fengdong Yu <fengdo...@everstring.com> wrote: > what’s your SQL? > > > > >> On Nov 16, 2015, at 3:02 PM, Yogesh Vyas <informy...@gmail.com> wrote: >&g

Re: JMX with Spark

2015-11-05 Thread Yogesh Vyas
Have you read this? > https://spark.apache.org/docs/latest/monitoring.html > > Romi Kuntsman, Big Data Engineer > http://www.totango.com > > On Thu, Nov 5, 2015 at 2:08 PM, Yogesh Vyas <informy...@gmail.com> wrote: >> >> Hi, >> How we can use

JMX with Spark

2015-11-05 Thread Yogesh Vyas
Hi, How we can use JMX and JConsole to monitor our Spark applications? - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Fwd: Get the previous state string

2015-10-15 Thread Yogesh Vyas
-- Forwarded message -- From: Yogesh Vyas <informy...@gmail.com> Date: Thu, Oct 15, 2015 at 6:08 PM Subject: Get the previous state string To: user@spark.apache.org Hi, I am new to Spark and was trying to do some experiments with it. I had a JavaPairDStream<String, Lis

Get list of Strings from its Previous State

2015-10-15 Thread Yogesh Vyas
Hi, I am new to Spark and was trying to do some experiments with it. I had a JavaPairDStream RDD. I want to get the list of string from its previous state. For that I use updateStateByKey function as follows: final Function2, Optional> updateFunc =