RE: Best way to read XML data from RDD

2016-08-22 Thread Puneet Tripathi
I was building a small app to stream messages from kafka via spark. The message was an xml, every message is a new xml. I wrote a simple app to do so[ this app expects the xml to be a single line] from __future__ import print_function from pyspark.sql import Row import xml.etree.ElementTree as

writing Kafka dstream to local flat file

2016-07-21 Thread Puneet Tripathi
Hi, I am trying to consume from Kafka topics following http://spark.apache.org/docs/latest/streaming-kafka-integration.html Approach one(createStream). I am not able to write it to local text file using saveAsTextFiles() function. Below is the code import pyspark from pyspark import

RE: Spark with HBase Error - Py4JJavaError

2016-07-08 Thread Puneet Tripathi
Hi Ram, Thanks very much it worked. Puneet From: ram kumar [mailto:ramkumarro...@gmail.com] Sent: Thursday, July 07, 2016 6:51 PM To: Puneet Tripathi Cc: user@spark.apache.org Subject: Re: Spark with HBase Error - Py4JJavaError Hi Puneet, Have you tried appending --jars $SPARK_HOME/lib/spark

RE: Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Guys, Please can anyone help on the issue below? Puneet From: Puneet Tripathi [mailto:puneet.tripa...@dunnhumby.com] Sent: Thursday, July 07, 2016 12:42 PM To: user@spark.apache.org Subject: Spark with HBase Error - Py4JJavaError Hi, We are running Hbase in fully distributed mode. I tried

Spark with HBase Error - Py4JJavaError

2016-07-07 Thread Puneet Tripathi
Hi, We are running Hbase in fully distributed mode. I tried to connect to Hbase via pyspark and then write to hbase using saveAsNewAPIHadoopDataset , but it failed the error says: Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.saveAsHadoopDataset. :