spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-24 Thread vr spark
Hi, The source file i have is on local machine and its pretty huge like 150 gb. How to go about it? On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 19 Nov 2016, at 17:21, vr spark <vrspark...@gmail.com> wrote: > > Hi, > I am

covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-19 Thread vr spark
Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2. convert to orc 3. store on distributed cloud storage thanks VR

receiving stream data options

2016-10-13 Thread vr spark
Hi, I have a continuous rest api stream which keeps spitting out data in form of json. I access the stream using python requests.get(url, stream=True, headers=headers). I want to receive them using spark and do further processing. I am not sure which is best way to receive it in spark. What are

Re: spark-submit failing but job running from scala ide

2016-09-26 Thread vr spark
g-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > > On Sun, Sep 25, 2016 at 4:32 PM, vr spark <vrspark...@gmail.com> wrote: > > yes, i have both spark 1.6 and spark 2.0. > > I unset the spark home environment variable and pointed spark submit to &

Running jobs against remote cluster from scala eclipse ide

2016-09-26 Thread vr spark
Hi, I use scala IDE for eclipse. I usually run job against my local spark installed on my mac and then export the jars and copy it to spark cluster of my company and run spark submit on it. This works fine. But i want to run the jobs from scala ide directly using the spark cluster of my company.

Re: spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark
1. > > You've got two Spark runtimes up that may or may not contribute to the > issue. > > Pozdrawiam, > Jacek Laskowski > > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/ja

spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark
Hi, I have this simple scala app which works fine when i run it as scala application from the scala IDE for eclipse. But when i export is as jar and run it from spark-submit i am getting below error. Please suggest *bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar* 16/09/24

Re: Undefined function json_array_to_map

2016-08-17 Thread vr spark
o raise AnalysisException(s.split(': ', 1)[1], stackTrace) AnalysisException: u'undefined function json_array_to_map; line 28 pos 73' On Wed, Aug 17, 2016 at 8:59 AM, vr spark <vrspark...@gmail.com> wrote: > spark 1.6.1 > python > > I0817 08:51:59.099356 15189 detect

Re: Attempting to accept an unknown offer

2016-08-17 Thread vr spark
sql ? > > On Wed, Aug 17, 2016 at 9:04 AM, vr spark <vrspark...@gmail.com> wrote: > >> spark 1.6.1 >> mesos >> job is running for like 10-15 minutes and giving this message and i >> killed it. >> >> In this job, i am creating data frame from a hive sql

Attempting to accept an unknown offer

2016-08-17 Thread vr spark
W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492 W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493 W0816 23:17:01.985124 16360

Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi, I am getting error on below scenario. Please suggest. i have a virtual view in hive view name log_data it has 2 columns query_map map parti_date int Here is my snippet for the spark data frame my dataframe res=sqlcont.sql("select parti_date FROM

Re: dataframe row list question

2016-08-12 Thread vr spark
Hi Experts, Please suggest On Thu, Aug 11, 2016 at 7:54 AM, vr spark <vrspark...@gmail.com> wrote: > > I have data which is json in this format > > myList: array > |||-- elem: struct > ||||-- nm: string (nullable = true) > ||||-- vL

dataframe row list question

2016-08-11 Thread vr spark
I have data which is json in this format myList: array |||-- elem: struct ||||-- nm: string (nullable = true) ||||-- vList: array (nullable = true) |||||-- element: string (containsNull = true) from my kafka stream, i created a dataframe

Re: read only specific jsons

2016-07-27 Thread vr spark
, 2016 at 12:05 PM, Cody Koeninger <c...@koeninger.org> wrote: > Have you tried filtering out corrupt records with something along the > lines of > > df.filter(df("_corrupt_record").isNull) > > On Tue, Jul 26, 2016 at 1:53 PM, vr spark <vrspark...@gmail.com>

read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. I am using pyspark kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF =

read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD(clickRDD)