Re: Kafka directsream receiving rate

2016-02-06 Thread Diwakar Dhanuskodi
Again why only 150K. Any clarification is much appreciated on directStream processing millions per batch . Sent from Samsung Mobile. Original message ---- From: Cody Koeninger <c...@koeninger.org> Date:06/02/2016 01:30 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.

Re: Kafka directsream receiving rate

2016-02-08 Thread Diwakar Dhanuskodi
Samsung Mobile. Original message From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Date:07/02/2016 01:39 (GMT+05:30) To: Cody Koeninger <c...@koeninger.org> Cc: user@spark.apache.org Subject: Re: Kafka directsream receiving rate Thanks Cody for trying to

RE: Apache Spark data locality when integrating with Kafka

2016-02-06 Thread Diwakar Dhanuskodi
Yes . To  reduce  network  latency . Sent from Samsung Mobile. Original message From: fanooos Date:07/02/2016 09:24 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Apache Spark data locality when integrating with Kafka Dears If I will use

Re: Kafka directsream receiving rate

2016-02-06 Thread Diwakar Dhanuskodi
Mobile. Original message From: Cody Koeninger <c...@koeninger.org> Date:06/02/2016 01:30 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: user@spark.apache.org Subject: Re: Kafka directsream receiving rate Have you tried just printing each message, to see w

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread Diwakar Dhanuskodi
Fanoos,  Where  you  want the solution to  be deployed ?. On premise or cloud? Regards  Diwakar . Sent from Samsung Mobile. Original message From: "Yuval.Itzchakov" Date:07/02/2016 19:38 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Re: Apache

Re: Kafka directsream receiving rate

2016-02-05 Thread Diwakar Dhanuskodi
er.jar /root/Jars/sparkreceiver.jar Sent from Samsung Mobile. Original message From: Cody Koeninger <c...@koeninger.org> Date:05/02/2016 22:07 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: user@spark.apache.org Subject: Re: Kaf

RE: Kafka directsream receiving rate

2016-02-04 Thread Diwakar Dhanuskodi
message From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Date:05/02/2016 07:33 (GMT+05:30) To: user@spark.apache.org Cc: Subject: Kafka directsream receiving rate Hi, Using spark 1.5.1. I have a topic with 20 partitions. When I publish 100 messages. Spark direct

kafkaDirectStream usage error

2016-02-04 Thread Diwakar Dhanuskodi
.unsafe.enabled=false" --conf "spark.streaming.backpressure.enabled=true" --conf "spark.locality.wait=1s" --conf "spark.shuffle.consolidateFiles=true"   --driver-memory 2g --executor-memory 1g --class com.tcs.dime.spark.SparkReceiver   --files /etc/hadoop/

Re: Kafka directsream receiving rate

2016-02-05 Thread Diwakar Dhanuskodi
<c...@koeninger.org> Date:06/02/2016 00:33 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: user@spark.apache.org Subject: Re: Kafka directsream receiving rate How are you counting the number of messages? I'd go ahead and remove the settings for

Re: Spark Submit

2016-02-12 Thread Diwakar Dhanuskodi
Try  spark-submit  --conf "spark.executor.memory=512m" --conf "spark.executor.extraJavaOptions=x" --conf "Dlog4j.configuration=log4j.xml" Sent from Samsung Mobile. Original message From: Ted Yu Date:12/02/2016 21:24 (GMT+05:30) To: Ashish Soni

RE: HADOOP_HOME are not set when try to run spark application in yarn cluster mode

2016-02-09 Thread Diwakar Dhanuskodi
loading resource file:/usr/lib/hadoop/etc/hadoop/yarn-site.xml -> hdfs://quickstart.cloudera:8020/user/cloudera/.sparkStaging/application_1455041341343_0002/yarn-site.xml From: Diwakar Dhanuskodi [mailto:diwakar.dhanusk...@gmail.com] Sent: Tuesday, February 09, 2016 10:00 AM To:

Re: AM creation in yarn client mode

2016-02-09 Thread Diwakar Dhanuskodi
Your  2nd assumption  is  correct . There  is  yarn client  which  polls AM while  running  in  yarn client mode  Sent from Samsung Mobile. Original message From: ayan guha Date:10/02/2016 10:55 (GMT+05:30) To: praveen S Cc: user

RE: HADOOP_HOME are not set when try to run spark application in yarn cluster mode

2016-02-09 Thread Diwakar Dhanuskodi
Pass  on  all  hadoop conf files  as  spark-submit parameters in --files Sent from Samsung Mobile. Original message From: Rachana Srivastava Date:09/02/2016 22:53 (GMT+05:30) To: user@spark.apache.org Cc: Subject: HADOOP_HOME are not

Re: Apache Spark data locality when integrating with Kafka

2016-02-07 Thread Diwakar Dhanuskodi
الليثي <dev.fano...@gmail.com> Date:08/02/2016 02:07 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: "Yuval.Itzchakov" <yuva...@gmail.com>, user <user@spark.apache.org> Subject: Re: Apache Spark data locality when integrating with Kafka Diw

RE: Fwd: DF creation

2016-03-19 Thread Diwakar Dhanuskodi
Import sqlContext.implicits._  before  using  df () Sent from Samsung Mobile. Original message From: satyajit vegesna Date:19/03/2016 06:00 (GMT+05:30) To: user@spark.apache.org, d...@spark.apache.org Cc: Subject: Fwd: DF creation Hi , I am

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
Cody Koeninger <c...@koeninger.org> wrote: > Those logs you're posting are from right after your failure, they don't > include what actually went wrong when attempting to read json. Look at your > logs more carefully. > On Aug 10, 2016 2:07 AM, "Diwakar Dhanuskodi" <

Re: Spark streaming not processing messages from partitioned topics

2016-08-11 Thread Diwakar Dhanuskodi
Figured it out. All I am doing wrong is testing it out in pseudo node vm with 1 core. The tasks were hanging out for cpu. In production cluster this works just fine. On Thu, Aug 11, 2016 at 12:45 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Checked executor lo

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
the kafka partitions? > If you use kafka-simple-consumer-shell.sh to read that partition, do > you get any data? > > On Wed, Aug 10, 2016 at 9:40 AM, Diwakar Dhanuskodi > <diwakar.dhanusk...@gmail.com> wrote: > > Hi Cody, > > > > Just added zookeeper.connect

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
ntext and just do > > rdd => { > rdd.foreach(println) > > > as a base line to see if you're reading the data you expect > > On Tue, Aug 9, 2016 at 3:47 PM, Diwakar Dhanuskodi > <diwakar.dhanusk...@gmail.com> wrote: > > Hi, > > > > I am reading jso

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
dd => { > if (rdd.isEmpty()) { > println("Failed to get data from Kafka. Please check that the Kafka > producer is streaming data.") > System.exit(-1) > } > val sqlContext = org.apache.spark.sql.SQLContext.getOrCreate(rdd. > sparkContext) > val wea

Re: Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
at a time to the working > rdd.foreach example and see when it stops working, then take a closer > look at the logs. > > > On Tue, Aug 9, 2016 at 10:20 PM, Diwakar Dhanuskodi > <diwakar.dhanusk...@gmail.com> wrote: > > Hi Cody, > > > > Without conditional .

Re: Spark streaming not processing messages from partitioned topics

2016-08-10 Thread Diwakar Dhanuskodi
Added jobs for time 147081258 ms 16/08/10 12:34:00 INFO JobScheduler: Added jobs for time 147081264 ms On Wed, Aug 10, 2016 at 10:26 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Hi Siva, > > Does topic has partitions? which version of Spark you are usi

KafkaUtils.createStream not picking smallest offset

2016-08-12 Thread Diwakar Dhanuskodi
Hi, We are  using  spark  1.6.1 and  kafka 0.9. KafkaUtils.createStream is  showing strange behaviour. Though   auto.offset.reset is  set  to  smallest .  Whenever we  need  to  restart  the   stream it  is  picking up  the  latest  offset which  is not  expected. Do  we  need  to  set  any  

Re: KafkaUtils.createStream not picking smallest offset

2016-08-13 Thread Diwakar Dhanuskodi
From: Cody Koeninger <c...@koeninger.org> Date:12/08/2016 23:42 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>, user@spark.apache.org Cc: Subject: Re: KafkaUtils.createStream not picking smallest offset Are you checkpointing? Beyond that, why are you using

Re: Spark streaming takes longer time to read json into dataframes

2016-07-17 Thread Diwakar Dhanuskodi
. Original message From: Martin Eden <martineden...@gmail.com> Date:16/07/2016 14:01 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: user <user@spark.apache.org> Subject: Re: Spark streaming takes longer time to read json into dataframes Hi

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
Mobile. Original message From: Cody Koeninger <c...@koeninger.org> Date:19/07/2016 20:49 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: Martin Eden <martineden...@gmail.com>, user <user@spark.apache.org> Subject: Re: Spark str

Re: Spark streaming takes longer time to read json into dataframes

2016-07-19 Thread Diwakar Dhanuskodi
Mobile. Original message From: Cody Koeninger <c...@koeninger.org> Date:19/07/2016 20:49 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: Martin Eden <martineden...@gmail.com>, user <user@spark.apache.org> Subject: Re: Spark str

Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
Hello, I have 400K json messages pulled from Kafka into spark streaming using DirectStream approach. Size of 400K messages is around 5G. Kafka topic is single partitioned. I am using rdd.read.json(_._2) inside foreachRDD to convert rdd into dataframe. It takes almost 2.3 minutes to convert into

Fwd: Spark streaming takes longer time to read json into dataframes

2016-07-15 Thread Diwakar Dhanuskodi
-- Forwarded message -- From: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Date: Sat, Jul 16, 2016 at 9:30 AM Subject: Re: Spark streaming takes longer time to read json into dataframes To: Jean Georges Perrin <j...@jgp.net> Hello, I need it on memory. Increa

Spark streaming not processing messages from partitioned topics

2016-08-09 Thread Diwakar Dhanuskodi
Hi, I am reading json messages from kafka . Topics has 2 partitions. When running streaming job using spark-submit, I could see that * val dataFrame = sqlContext.read.json(rdd.map(_._2)) *executes indefinitely. Am I doing something wrong here. Below is code .This environment is cloudera sandbox

createDirectStream parallelism

2016-08-18 Thread Diwakar Dhanuskodi
We are using createDirectStream api to receive messages from 48 partitioned topic. I am setting up --num-executors 48 & --executor-cores 1 in spark-submit All partitions were parallely received and corresponding RDDs in foreachRDD were executed in parallel. But when I join a transformed RDD

Spark streaming

2016-08-18 Thread Diwakar Dhanuskodi
Hi, Is there a way to  specify in  createDirectStream to receive only last 'n' offsets of a specific topic and partition. I don't want to filter out in foreachRDD.   Sent from Samsung Mobile.

Best way to read XML data from RDD

2016-08-19 Thread Diwakar Dhanuskodi
Hi, There is a RDD with json data. I could read json data using rdd.read.json . The json data has XML data in couple of key-value paris. Which is the best method to read and parse XML from rdd. Is there any specific xml libraries for spark. Could anyone help on this. Thanks.

Re: Best way to read XML data from RDD

2016-08-19 Thread Diwakar Dhanuskodi
to parse XML from 2 million messages in a 3 nodes 100G 4 cpu each environment.  Sent from Samsung Mobile. Original message From: Felix Cheung <felixcheun...@hotmail.com> Date:20/08/2016 09:49 (GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>,

Re: Best way to read XML data from RDD

2016-08-22 Thread Diwakar Dhanuskodi
, Aug 22, 2016 at 4:34 PM, Hyukjin Kwon <gurwls...@gmail.com> wrote: > Do you mind share your codes and sample data? It should be okay with > single XML if I remember this correctly. > > 2016-08-22 19:53 GMT+09:00 Diwakar Dhanuskodi < > diwakar.dhanusk...@gmail.com>: >

Re: Best way to read XML data from RDD

2016-08-22 Thread Diwakar Dhanuskodi
.@gmail.com> Cc: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>, user <user@spark.apache.org> Subject: Re: Best way to read XML data from RDD Another option would be to look at spark-xml-utils. We use this extensively in the man

Re: Best way to read XML data from RDD

2016-08-22 Thread Diwakar Dhanuskodi
(GMT+05:30) To: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com> Cc: Felix Cheung <felixcheun...@hotmail.com>, user <user@spark.apache.org> Subject: Re: Best way to read XML data from RDD I fear the issue is that this will create and destroy a XML parser object 2

Re: Best way to read XML data from RDD

2016-08-22 Thread Diwakar Dhanuskodi
: Jörn Franke <jornfra...@gmail.com> Cc: Diwakar Dhanuskodi <diwakar.dhanusk...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com>, user <user@spark.apache.org> Subject: Re: Best way to read XML data from RDD Hi Diwakar, Spark XML library can take RDD as sourc

Re: Spark build 1.6.2 error

2016-09-03 Thread Diwakar Dhanuskodi
:44 AM, Diwakar Dhanuskodi < diwakar.dhanusk...@gmail.com> wrote: > Hi, > > java version 7 > > mvn command > ./make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Phive > -Phive-thriftserver -Pyarn -Dmaven.version=3.0.4 > > > yes, I executed script to c

Re: Spark build 1.6.2 error

2016-09-03 Thread Diwakar Dhanuskodi
park-sql_2.11 On Thu, Sep 1, 2016 at 8:23 AM, Divya Gehlot <divya.htco...@gmail.com> wrote: > Which java version are you using ? > > On 31 August 2016 at 04:30, Diwakar Dhanuskodi < > diwakar.dhanusk...@gmail.com> wrote: > >> Hi, >> >> While bui

Re: Spark build 1.6.2 error

2016-09-03 Thread Diwakar Dhanuskodi
a > > On Sat 3 Sep, 2016, 12:14 PM Diwakar Dhanuskodi, < > diwakar.dhanusk...@gmail.com> wrote: > >> Hi, >> >> Just re-ran again without killing zinc server process >> >> /make-distribution.sh --name custom-spark --tgz -Phadoop-2.6 -Phive >>

Hive connection issues in spark-shell

2016-09-03 Thread Diwakar Dhanuskodi
Hi, I recently built spark using maven. Now when starting spark-shell, it couldn't connect hive and getting below error I couldn't find datanucleus jar in built library. But datanucleus jar is available in hive/lib folders. java.lang.ClassNotFoundException:

Re: Pls assist: Spark 2.0 build failure on Ubuntu 16.06

2016-09-03 Thread Diwakar Dhanuskodi
Please run with -X and post logs here. We can get exact error from it. On Sat, Sep 3, 2016 at 7:24 PM, Marco Mistroni wrote: > hi all > > i am getting failures when building spark 2.0 on Ubuntu 16.06 > Here's details of what i have installed on the ubuntu host > - java 8

Spark build 1.6.2 error

2016-08-30 Thread Diwakar Dhanuskodi
Hi, While building Spark 1.6.2 , getting below error in spark-sql. Much appreciate for any help. ERROR] missing or invalid dependency detected while loading class file 'WebUI.class'. Could not access term eclipse in package org, because it (or its dependencies) are missing. Check your build

RE: [Spark 2.0] ClassNotFoundException is thrown when using Hive

2016-08-18 Thread Diwakar Dhanuskodi
Hi Can  you  cross check by providing same library path in --jars of spark-submit and run . Sent from Samsung Mobile. Original message From: "颜发才(Yan Facai)" Date:18/08/2016 15:17 (GMT+05:30) To: "user.spark" Cc: Subject:

Foreachpartition in spark streaming

2017-03-20 Thread Diwakar Dhanuskodi
Just wanted to clarify!!! Is foreachPartition in spark an output operation? Which one is better use mapPartitions or foreachPartitions? Regards Diwakar

Spark yarn cluster

2020-07-11 Thread Diwakar Dhanuskodi
Hi , Could it be possible to setup Spark within Yarn cluster which may not have Hadoop?. Thanks.

Re: Spark yarn cluster

2020-07-11 Thread Diwakar Dhanuskodi
need to use > other components of the Hadoop cluster, namely MapReduce and HDFS. > > That being said, if you just need cluster scheduling and not using > MapReduce nor HDFS it is possible you will be fine with the Spark > Standalone cluster. > > Regards, > Juan Martín