How to disable pushdown predicate in spark 2.x query

2020-06-22 Thread Mohit Durgapal
Hi All, I am trying to read a table of a relational database using spark 2.x. I am using code like the following: sparkContext.read().jdbc(url, table , connectionProperties).select('SELECT_COLUMN').where(whereClause); Now, What's happening is spark is actually the SQL query which spark is

How to split a dataframe into two dataframes based on count

2020-05-18 Thread Mohit Durgapal
Dear All, I would like to know how, in spark 2.0, can I split a dataframe into two dataframes when I know the exact counts the two dataframes should have. I tried using limit but got quite weird results. Also, I am looking for exact counts in child dfs, not the approximate % based split.

getting error on spark streaming : java.lang.OutOfMemoryError: unable to create new native thread

2016-11-22 Thread Mohit Durgapal
Hi Everyone, I am getting the following error while running a spark streaming example on my local machine, the being ingested is only 506kb. *16/11/23 03:05:54 INFO MappedDStream: Slicing from 1479850537180 ms to 1479850537235 ms (aligned to 1479850537180 ms and 1479850537235 ms)* *Exception

Re: newbie question about RDD

2016-11-22 Thread Mohit Durgapal
Hi Raghav, Please refer to the following code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setAppName("PersonApp"); //creating java spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); //reading file from hfs into spark rdd , the name node is localhost JavaRDD

how do you convert directstream into data frames

2015-08-13 Thread Mohit Durgapal
Hi All, After creating a direct stream like below: val events = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet) I would like to convert the above stream into data frames, so that I could run hive queries over it. Could anyone

Re: how do you convert directstream into data frames

2015-08-13 Thread Mohit Durgapal
Any idea anyone? On Fri, Aug 14, 2015 at 10:11 AM, Mohit Durgapal durgapalmo...@gmail.com wrote: Hi All, After creating a direct stream like below: val events = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet) I would

spark-kafka directAPI vs receivers based API

2015-08-10 Thread Mohit Durgapal
Hi All, I just wanted to know how does directAPI for spark streaming compare with earlier receivers based API. Has anyone used directAPI based approach on production or is it still being used for pocs? Also, since I'm new to spark, could anyone share a starting point from where I could find a

spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

spark streaming from kafka real time + batch processing in java

2015-02-05 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

connecting spark with ActiveMQ

2015-02-03 Thread Mohit Durgapal
Hi All, I have a requirement where I need to consume messages from ActiveMQ and do live stream processing as well as batch processing using Spark. Is there a spark-plugin or library that can enable this? If not, then do you know any other way this could be done? Regards Mohit