Re: SparkAppHandle.Listener.infoChanged behaviour

2017-06-05 Thread Mohammad Tariq
<http://about.me/mti> On Mon, Jun 5, 2017 at 7:24 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Sat, Jun 3, 2017 at 7:16 PM, Mohammad Tariq <donta...@gmail.com> wrote: > > I am having a bit of difficulty in understanding the exact behaviour of > > SparkAppHan

SparkAppHandle.Listener.infoChanged behaviour

2017-06-03 Thread Mohammad Tariq
Dear fellow Spark users, I am having a bit of difficulty in understanding the exact behaviour of *SparkAppHandle.Listener.infoChanged(SparkAppHandle handle)* method. The documentation says : *Callback for changes in any information that is not the handle's state.* What exactly is meant by *any

Application not found in RM

2017-04-17 Thread Mohammad Tariq
Dear fellow Spark users, *Use case :* I have written a small java client which launches multiple Spark jobs through *SparkLauncher* and captures jobs' metrics during the course of the execution. *Issue :* Sometimes the client fails saying - *Caused by:

Intermittent issue while running Spark job through SparkLauncher

2017-03-25 Thread Mohammad Tariq
Dear fellow Spark users, I have a multithreaded Java program which launches multiple Spark jobs in parallel through *SparkLauncher* API. It also monitors these Spark jobs and keeps on updating information like job start/end time, current state, tracking url etc in an audit table. To get these

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
, Mohammad [image: https://]about.me/mti <https://about.me/mti?promo=email_sig_source=product_medium=email_sig_campaign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Wed, Mar 1, 2017 at 12:27 AM, Mohammad Tariq <donta...@gmail.co

Re: using spark to load a data warehouse in real time

2017-02-28 Thread Mohammad Tariq
Hi Adaryl, You could definitely load data into a warehouse through Spark's JDBC support through DataFrames. Could you please explain your use case a bit more? That'll help us in answering your query better. [image: --] Tariq, Mohammad [image: https://]about.me/mti

Re: Need guidelines in Spark Streaming and Kafka integration

2016-11-16 Thread Mohammad Tariq
Hi Karim, Are you looking for something specific? Some information about your usecase would be really helpful in order to answer your question. On Wednesday, November 16, 2016, Karim, Md. Rezaul < rezaul.ka...@insight-centre.org> wrote: > Hi All, > > I am completely new with Kafka. I was

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
ps://github.com/apache/spark/blob/a8ea4da8d04c1ed621a96668118f20 > 739145edd2/yarn/src/test/scala/org/apache/spark/deploy/ > yarn/YarnClusterSuite.scala#L164 > > > On Thu, Nov 10, 2016 at 3:00 PM, Mohammad Tariq <donta...@gmail.com> > wrote: > >> All I want to do is

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
omo=email_sig_source=email_sig_medium=external_link_campaign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Fri, Nov 11, 2016 at 4:27 AM, Mohammad Tariq <donta...@gmail.com> wrote: > Yeah, that definitely makes sense. I wa

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
Vanzin <van...@cloudera.com> wrote: > On Thu, Nov 10, 2016 at 2:43 PM, Mohammad Tariq <donta...@gmail.com> > wrote: > > @Override > > public void stateChanged(SparkAppHandle handle) { > > System.out.println("Spark App Id [" + handle.getAppId()

Re: Correct SparkLauncher usage

2016-11-10 Thread Mohammad Tariq
Nov 8, 2016 at 5:16 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > Then you need to look at your logs to figure out why the child app is not > working. "startApplication" will by default redirect the child's output to > the parent's logs. > > On Mon, Nov 7, 201

Re: Correct SparkLauncher usage

2016-11-07 Thread Mohammad Tariq
ign=chrome_ext> [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Tue, Nov 8, 2016 at 5:06 AM, Marcelo Vanzin <van...@cloudera.com> wrote: > On Mon, Nov 7, 2016 at 3:29 PM, Mohammad Tariq <donta...@gmail.com> wrote: &g

Correct SparkLauncher usage

2016-11-07 Thread Mohammad Tariq
Dear fellow Spark users, I have been trying to use *SparkLauncher.startApplication()* to launch a Spark app from within java code, but unable to do so. However, same piece of code is working if I use *SparkLauncher.launch()*. Here are the corresponding code snippets : *SparkAppHandle handle =

Re: [Erorr:]vieiwng Web UI on EMR cluster

2016-09-12 Thread Mohammad Tariq
Hi Divya, Do you you have inbounds enabled on port 50070 of your NN machine. Also, it's a good idea to have the public DNS in your /etc/hosts for proper name resolution. [image: --] Tariq, Mohammad [image: https://]about.me/mti

Re: Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
claimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 28 July 2016 at 12:45, Mohammad Tariq <donta...@gmail.com > <javascript:_e(%7B%7D,'cvml','donta...@gmail.com');>> wrote: > >&g

Is spark-1.6.1-bin-2.6.0 compatible with hive-1.1.0-cdh5.7.1

2016-07-28 Thread Mohammad Tariq
Could anyone please help me with this? I have been using the same version of Spark with CDH-5.4.5 successfully so far. However after a recent CDH upgrade I'm not able to run the same Spark SQL module against hive-1.1.0-cdh5.7.1. When I try to run my program Spark tries to connect to local derby

Re: Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Forgot to add, I'm on HBase 1.0.0-cdh5.4.5, so can't use HBaseContext. And spark version is 1.6.1 [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Thu, Jun 16, 2016 at 10:12 PM, Mohammad Tariq <donta...@gmail.com> wrote: > Hi gr

Recommended way to push data into HBase through Spark streaming

2016-06-16 Thread Mohammad Tariq
Hi group, I have a streaming job which reads data from Kafka, performs some computation and pushes the result into HBase. Actually the results are pushed into 3 different HBase tables. So I was wondering what could be the best way to achieve this. Since each executor will open its own HBase

Re: Running Spark in Standalone or local modes

2016-06-11 Thread Mohammad Tariq
Hi Ashok, In local mode all the processes run inside a single jvm, whereas in standalone mode we have separate master and worker processes running in their own jvms. To quickly test your code from within your IDE you could probable use the local mode. However, to get a real feel of how Spark

DataFrame.foreach(scala.Function1) example

2016-06-10 Thread Mohammad Tariq
Dear fellow spark users, Could someone please point me to any example showcasing the usage of *DataFrame.oreach(scala.Function1)* in *Java*? *Problem statement :* I am reading data from a Kafka topic, and for each RDD in the DStream I am creating a DataFrame in order to perform some operations.

Re: Spark streaming readind avro from kafka

2016-06-01 Thread Mohammad Tariq
Hi Neeraj, You might find Kafka-Direct useful. BTW, are you using something like Confluent for you Kafka setup. If that's the case you might leverage Schema registry to get hold of the associated schema without additional

Recommended way to close resources in a Spark streaming application

2016-05-31 Thread Mohammad Tariq
Dear fellow Spark users, I have a streaming app which is reading data from Kafka, doing some computations and storing the results into HBase. Since I am new to Spark streaming I feel that there could still be scope of making my app better. To begin with, I was wondering what's the best way to

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
for comments from the gurus. [image: http://] Tariq, Mohammad about.me/mti [image: http://] <http://about.me/mti> On Thu, Mar 3, 2016 at 5:35 AM, Mohammad Tariq <donta...@gmail.com> wrote: > Cool. Here is it how it goes... > > I am reading Avro objects from a Kaf

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
gt; wrote: > Hi Tariq, > > Can you tell in brief what kind of operation you have to do? I can try > helping you out with that. > In general, if you are trying to use any group operations you can use > window operations. > > On Wed, Mar 2, 2016 at 6:40 PM, Mohammad Tar

Re: Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
rying to > perform. > > On Wed, Mar 2, 2016 at 6:21 PM, Mohammad Tariq <donta...@gmail.com> wrote: > >> Hi list, >> >> *Scenario :* >> I am creating a DStream by reading an Avro object from a Kafka topic and >> then converting it into a DataFrame t

Does DataFrame.collect() maintain the underlying schema?

2016-03-02 Thread Mohammad Tariq
Hi list, *Scenario :* I am creating a DStream by reading an Avro object from a Kafka topic and then converting it into a DataFrame to perform some operations on the data. I call DataFrame.collect() and perform the intended operation on each Row of Array[Row] returned by DataFrame.collect().

Re: [Spark 1.5.2]: Iterate through Dataframe columns and put it in map

2016-03-02 Thread Mohammad Tariq
Hi Divya, You could call *collect()* method provided by DataFram API. This will give you an *Array[Rows]*. You could then iterate over this array and create your map. Something like this : val mapOfVals = scala.collection.mutable.Map[String,String]() var rows = DataFrame.collect() rows.foreach(r

Re: select * from mytable where column1 in (select max(column1) from mytable)

2016-02-25 Thread Mohammad Tariq
Spark doesn't support subqueries in WHERE clause, IIRC. It supports subqueries only in the FROM clause as of now. See this ticket for more on this. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri,

Re: Access fields by name/index from Avro data read from Kafka through Spark Streaming

2016-02-25 Thread Mohammad Tariq
anything you want. > > On Thu, Feb 25, 2016 at 11:06 AM, Mohammad Tariq <donta...@gmail.com> > wrote: > >> Hi group, >> >> I have just started working with confluent platform and spark streaming, >> and was wondering if it is possible to access individu

Re: Spark SQL support for sub-queries

2016-02-25 Thread Mohammad Tariq
AFAIK, this isn't supported yet. A ticket is in progress though. [image: http://] Tariq, Mohammad about.me/mti [image: http://] On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <

Access fields by name/index from Avro data read from Kafka through Spark Streaming

2016-02-25 Thread Mohammad Tariq
Hi group, I have just started working with confluent platform and spark streaming, and was wondering if it is possible to access individual fields from an Avro object read from a kafka topic through spark streaming. As per its default behaviour *KafkaUtils.createDirectStream[Object, Object,

Spark with proxy

2015-09-08 Thread Mohammad Tariq
Hi friends, Is it possible to interact with Amazon S3 using Spark via a proxy? This is what I have been doing : SparkConf conf = new SparkConf().setAppName("MyApp").setMaster("local"); JavaSparkContext sparkContext = new JavaSparkContext(conf); Configuration hadoopConf =

Re: DataFrame insertIntoJDBC parallelism while writing data into a DB table

2015-06-16 Thread Mohammad Tariq
I would really appreciate if someone could help me with this. On Monday, June 15, 2015, Mohammad Tariq donta...@gmail.com wrote: Hello list, The method *insertIntoJDBC(url: String, table: String, overwrite: Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into a JDBC DB

DataFrame insertIntoJDBC parallelism while writing data into a DB table

2015-06-15 Thread Mohammad Tariq
Hello list, The method *insertIntoJDBC(url: String, table: String, overwrite: Boolean)* provided by Spark DataFrame allows us to copy a DataFrame into a JDBC DB table. Similar functionality is provided by the *createJDBCTable(url: String, table: String, allowExisting: Boolean) *method. But if you

Transactional guarantee while saving DataFrame into a DB

2015-06-02 Thread Mohammad Tariq
Hi list, With the help of Spark DataFrame API we can save a DataFrame into a database table through insertIntoJDBC() call. However, I could not find any info about how it handles the transactional guarantee. What if my program gets killed during the processing? Would it end up in partial load?

Re: Forbidded : Error Code: 403

2015-05-18 Thread Mohammad Tariq
/file.avro, com.databricks.spark.avro); Thanks Best Regards On Sat, May 16, 2015 at 2:02 AM, Mohammad Tariq donta...@gmail.com wrote: Thanks for the suggestion Steve. I'll try that out. Read the long story last night while struggling with this :). I made sure that I don't have any

Forbidded : Error Code: 403

2015-05-15 Thread Mohammad Tariq
Hello list, *Scenario : *I am trying to read an Avro file stored in S3 and create a DataFrame out of it using *Spark-Avro* https://github.com/databricks/spark-avro library, but unable to do so. This is the code which I am using : public class S3DataFrame { public static void main(String[] args)

Re: Forbidded : Error Code: 403

2015-05-15 Thread Mohammad Tariq
Thanks for the suggestion Steve. I'll try that out. Read the long story last night while struggling with this :). I made sure that I don't have any '/' in my key. On Saturday, May 16, 2015, Steve Loughran ste...@hortonworks.com wrote: On 15 May 2015, at 21:20, Mohammad Tariq donta