reading files with .list extension

2016-10-15 Thread Hafiz Mujadid
hi, I want to load the files in apache-spark with .list extensions as actors.list.gz here . Can anybody please suggest me the Hadoop input format for such kind of files? Thanks

State management in spark-streaming

2015-12-06 Thread Hafiz Mujadid
Hi, I have spark streaming with mqtt as my source. There are continuous events of flame sensors i.e. Fire and no Fire. I want to generate Fire event when the newly event is for Fire and want to ignore all subsequent event until No fire event is happened. Similarly If i get No-Fire Event i will

writing to hive

2015-10-13 Thread Hafiz Mujadid
hi! I am following this tutorial to read and write from hive. But i am facing following exception when i run the code. 15/10/12 14:57:36 INFO storage.BlockManagerMaster: Registered BlockManager 15/10/12 14:57:38

read from hive tables and write back to hive

2015-10-12 Thread Hafiz Mujadid
Hi! How can i read/write data from/to hive? Is it necessary to compile spark with hive profile to interact with hive? which maven dependencies are required to interact with hive? i could not find a well documentation to follow step by step to get working with hive. thanks -- View this

Save dataframe into hbase

2015-09-02 Thread Hafiz Mujadid
Hi What is the efficient way to save Dataframe into hbase? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Save-dataframe-into-hbase-tp24552.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

wild cards in spark sql

2015-09-02 Thread Hafiz Mujadid
Hi does spark sql support wild cards to filter data in sql queries just like we can filter data in sql queries in RDBMS with different wild cards like % and ? etc. In other words how can i write following query in spar sql select * from employee where ename like 'a%d' thanks -- View this

Schema From parquet file

2015-09-01 Thread Hafiz Mujadid
Hi all! Is there any way to get schema from a parquet file without loading into dataframe? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Schema-From-parquet-file-tp24535.html Sent from the Apache Spark User List mailing list archive at

reading multiple parquet file using spark sql

2015-09-01 Thread Hafiz Mujadid
Hi I want to read multiple parquet files using spark sql load method. just like we can pass multiple comma separated path to sc.textfile method. Is ther anyway to do the same ? Thanks -- View this message in context:

Writing test case for spark streaming checkpointing

2015-08-27 Thread Hafiz Mujadid
Hi! I have enables check pointing in spark streaming with kafka. I can see that spark streaming is checkpointing to the mentioned directory at hdfs. How can i test that it works fine and recover with no data loss ? Thanks -- View this message in context:

giving offset in spark sql

2015-08-04 Thread Hafiz Mujadid
Hi all! I want to skip first n rows from a dataframe? This is done in normal sql using offset keyword. How can we achieve in spark sql? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/giving-offset-in-spark-sql-tp24130.html Sent from the Apache

Custom partitioner

2015-07-26 Thread Hafiz Mujadid
Hi I have csv data in which i have a column of date time. I want to partition my data in 12 partitions with each partition containing data of one month only. I am not getting how to write such partitioner and how to use that partitioner to read write data. Kindly help me in this regard.

com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException in spark with mysql database

2015-07-06 Thread Hafiz Mujadid
Hi! I am trying to load data from my sql database using following code val query=select * from +table+ val url = jdbc:mysql:// + dataBaseHost + : + dataBasePort + / + dataBaseName + ?user= + db_user + password= + db_pass val sc = new SparkContext(new

Converting spark JDBCRDD to DataFrame

2015-07-06 Thread Hafiz Mujadid
Hi all! what is the most efficient way to convert jdbcRDD to DataFrame. any example? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Converting-spark-JDBCRDD-to-DataFrame-tp23647.html Sent from the Apache Spark User List mailing list archive at

lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid
Hi all! I am trying to read records from offset 100 to 110 from a table using following piece of code val sc = new SparkContext(new SparkConf().setAppName(SparkJdbcDs).setMaster(local[*])) val sqlContext = new SQLContext(sc) val options = new HashMap[String, String]()

Re: lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid
); Thanks, Manohar *From:* Hafiz Mujadid [via Apache Spark User List] [mailto:ml-node+[hidden email] http:///user/SendEmail.jtp?type=nodenode=23637i=0] *Sent:* Monday, July 6, 2015 10:56 AM *To:* Manohar Reddy *Subject:* lower and upper offset not working in spark with mysql database

Re: making dataframe for different types using spark-csv

2015-07-02 Thread Hafiz Mujadid
- false, delimiter - ,, mode - FAILFAST)) From: Hafiz Mujadid hafizmujadi...@gmail.com Date: Wednesday, July 1, 2015 at 10:59 PM To: Mohammed Guller moham...@glassbeam.com Cc: Krishna Sankar ksanka...@gmail.com, user@spark.apache.org user@spark.apache.org Subject: Re: making dataframe

making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
Hi experts! I am using spark-csv to lead csv data into dataframe. By default it makes type of each column as string. Is there some way to get dataframe of actual types like int,double etc.? Thanks -- View this message in context:

coalesce on dataFrame

2015-07-01 Thread Hafiz Mujadid
How can we use coalesce(1, true) on dataFrame? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-on-dataFrame-tp23564.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
the schema programmatically as shown here: https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema Mohammed *From:* Krishna Sankar [mailto:ksanka...@gmail.com] *Sent:* Wednesday, July 1, 2015 3:09 PM *To:* Hafiz Mujadid *Cc:* user

flume sinks supported by spark streaming

2015-06-23 Thread Hafiz Mujadid
Hi! I want to integrate flume with spark streaming. I want to know which sink type of flume are supported by spark streaming? I saw one example using avroSink. Thanks -- View this message in context:

cassandra with jdbcRDD

2015-06-16 Thread Hafiz Mujadid
hi all! is there a way to connect cassandra with jdbcRDD ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cassandra-with-jdbcRDD-tp23335.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

redshift spark

2015-06-05 Thread Hafiz Mujadid
Hi All, I want to read and write data to aws redshift. I found spark-redshift project at following address. https://github.com/databricks/spark-redshift in its documentation there is following code is written. import com.databricks.spark.redshift.RedshiftInputFormat val records =

setting spark configuration properties problem

2015-05-05 Thread Hafiz Mujadid
Hi all, i have declared spark context at start of my program and then i want to change it's configurations at some later stage in my code as written below val conf = new SparkConf().setAppName(Cassandra Demo) var sc:SparkContext=new SparkContext(conf)

empty jdbc RDD in spark

2015-05-02 Thread Hafiz Mujadid
Hi all! I am trying to read hana database using spark jdbc RDD here is my code def readFromHana() { val conf = new SparkConf() conf.setAppName(test).setMaster(local) val sc = new SparkContext(conf) val rdd = new JdbcRDD(sc, () = {

sap hana database laod using jdbcRDD

2015-04-30 Thread Hafiz Mujadid
Hi ! Can we load hana database table using spark jdbc RDD? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sap-hana-database-laod-using-jdbcRDD-tp22726.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

saving schemaRDD to cassandra

2015-03-27 Thread Hafiz Mujadid
Hi experts! I would like to know is there anyway to store schemaRDD to cassandra? if yes then how to store in existing cassandra column family and new column family? Thanks -- View this message in context:

Downloading data from url

2015-03-17 Thread Hafiz Mujadid
Hi experts! Is there any api in spark to download data from url? I want to download data from url in a spark application. I want to get downloading on all nodes instead of a single node. Thanks -- View this message in context:

connecting spark application with SAP hana

2015-03-12 Thread Hafiz Mujadid
Hi experts! Is there any way to connect SAP hana in spark application and get data from hana tables in our spark application? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/connecting-spark-application-with-SAP-hana-tp22011.html Sent from the

spark streaming, batchinterval,windowinterval and window sliding interval difference

2015-02-26 Thread Hafiz Mujadid
Can somebody explain the difference between batchinterval,windowinterval and window sliding interval with example. If there is any real time use case of using these parameters? Thanks -- View this message in context:

running spark project using java -cp command

2015-02-09 Thread Hafiz Mujadid
hi experts! Is there any way to run spark application using java -cp command ? thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/running-spark-project-using-java-cp-command-tp21567.html Sent from the Apache Spark User List mailing list archive at

using spark in web services

2015-02-09 Thread Hafiz Mujadid
Hi experts! I am trying to use spark in my restful webservices.I am using scala lift frramework for writing web services. Here is my boot class class Boot extends Bootable { def boot { Constants.loadConfiguration val sc=new SparkContext(new

LeaseExpiredException while writing schemardd to hdfs

2015-02-03 Thread Hafiz Mujadid
I want to write whole schemardd to single in hdfs but facing following exception rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder DFSClient_NONMAPREDUCE_-564238432_57

Re: Linkage error - duplicate class definition

2015-01-20 Thread Hafiz Mujadid
Have you solved this problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Linkage-error-duplicate-class-definition-tp9482p21260.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

spark streaming kinesis issue

2015-01-20 Thread Hafiz Mujadid
Hi experts! I am using spark streaming with kinesis and getting this exception while running program java.lang.LinkageError: loader (instance of org/apache/spark/executor/ChildExecutorURLClassLoader$userClassLoader$): attempted duplicate class definition for name:

com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

2015-01-19 Thread Hafiz Mujadid
Hi all! I am trying to use kinesis and spark streaming together. So when I execute program I get exception com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain Here is my piece of code val credentials = new

kinesis multiple records adding into stream

2015-01-16 Thread Hafiz Mujadid
Hi Experts! I am using kinesis dependency as follow groupId = org.apache.spark artifactId = spark-streaming-kinesis-asl_2.10 version = 1.2.0 in this aws sdk version 1.8.3 is being used. in this sdk multiple records can not be put in a single request. is it possible to put multiple records in a

Re: Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid
it to scala Thanks. On Thu, Jan 15, 2015 at 7:46 PM, Hafiz Mujadid [via Apache Spark User List] [hidden email] http:///user/SendEmail.jtp?type=nodenode=21163i=0 wrote: hi experts! I hav an RDD[String] and i want to add schema line at beginning in this rdd. I know RDD is immutable. So

kinesis creating stream scala code exception

2015-01-15 Thread Hafiz Mujadid
Hi, Expert I want to consumes data from kinesis stream using spark streaming. I am trying to create kinesis stream using scala code. Here is my code def main(args: Array[String]) { println(Stream creation started) if(create(2)) println(Stream is created successfully)

Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid
hi experts! I hav an RDD[String] and i want to add schema line at beginning in this rdd. I know RDD is immutable. So is there anyway to have a new rdd with one schema line and contents of previous rdd? Thanks -- View this message in context:

Re: creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid
. (It will take time and quite not efficient, tough) Regards, Kevin. On Mon Jan 12 2015 at 7:57:39 PM Hafiz Mujadid [via Apache Spark User List] [hidden email] http:///user/SendEmail.jtp?type=nodenode=21098i=0 wrote: Hi experts! I have a schemaRDD of messages to be pushed in kafka. So I am using

creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid
Hi experts! I have a schemaRDD of messages to be pushed in kafka. So I am using following piece of code to do that rdd.foreachPartition(itr = { val props = new Properties() props.put(metadata.broker.list, brokersList)

skipping header from each file

2015-01-08 Thread Hafiz Mujadid
Suppose I give three files paths to spark context to read and each file has schema in first row. how can we skip schema lines from headers val rdd=sc.textFile(file1,file2,file3); -- View this message in context:

max receiving rate in spark streaming

2015-01-07 Thread Hafiz Mujadid
Hi experts! Is there any way to decide what can be effective receiving rate for kafka spark streaming? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/max-receiving-rate-in-spark-streaming-tp21013.html Sent from the Apache Spark User List mailing

stopping streaming context

2015-01-05 Thread Hafiz Mujadid
Hi experts! Please is there anyway to stop spark streaming context when 5 batches are completed ? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/stopping-streaming-context-tp20970.html Sent from the Apache Spark User List mailing list archive at

getting number of partition per topic in kafka

2015-01-03 Thread Hafiz Mujadid
Hi experts! I am currently working on spark streaming with kafka. I have couple of questions related to this task. 1) Is there a way to find number of partitions given a topic name? 2)Is there a way to detect whether kafka server is running or not ? Thanks -- View this message in context:

Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

2014-12-31 Thread Hafiz Mujadid
I am accessing hdfs with spark .textFile method. and I receive error as Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 here are my dependencies

removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
hi dears! Is there some efficient way to drop first line of an RDD[String]? any suggestion? Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html Sent from the Apache Spark User List mailing list

Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
yep Michael Quinlan,it's working as suggested by Hoe Ren thansk to you and Hoe Ren -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html Sent from the Apache Spark User List mailing list archive at

SchemaRDD to RDD[String]

2014-12-23 Thread Hafiz Mujadid
Hi dears! I want to convert a schemaRDD into RDD of String. How can we do that? Currently I am doing like this which is not converting correctly no exception but resultant strings are empty here is my code def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = { var

reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
Hi experts! what is efficient way to read all files using spark from directory and its sub-directories as well.currently i move all files from directory and it sub-directories into another temporary directory and then read them all using sc.textFile method. But I want a method so that moving to

Re: reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
thanks bethesda! But if we have structure like this a/b/a.txt a/c/c.txt a/d/e/e.txt then how can we handle this case? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782p20785.html Sent from the Apache Spark

Saving Data only if Dstream is not empty

2014-12-08 Thread Hafiz Mujadid
Hi Experts! I want to save DStream to HDFS only if it is not empty such that it contains some kafka messages to be stored. What is an efficient way to do this. var data = KafkaUtils.createStream[Array[Byte], Array[Byte], DefaultDecoder, DefaultDecoder](ssc, params, topicMap,

Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid
Hi! does anybody has some useful example of StreamingListener interface. When and how can we use this interface to stop streaming when one batch of data is processed? Thanks alot -- View this message in context:

Re: running Spark Streaming just once and stop it

2014-12-04 Thread Hafiz Mujadid
Hi Kal El! Have you done stopping streaming after first iteration? if yes can you share example code. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/running-Spark-Streaming-just-once-and-stop-it-tp1382p20359.html Sent from the Apache Spark User

Re: Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid
Thanks Akhil You are so helping Dear. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Example-usage-of-StreamingListener-tp20357p20362.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid
Hi Experts! Is there a way to read first N messages from kafka stream and put them in some collection and return to the caller for visualization purpose and close spark streaming. I will be glad to hear from you and will be thankful to you. Currently I have following code that def

Re: getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid
Hi Akhil! Thanks for your response. Can you please suggest me how to return this sample from a function to the caller and stopping SparkStreaming Thanks -- View this message in context:

converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid
Hi everyOne! I want to convert a DStream[String] into an RDD[String]. I could not find how to do this. var data = KafkaUtils.createStream[Array[Byte], Array[Byte], DefaultDecoder, DefaultDecoder](ssc, consumerConfig, topicMap, StorageLevel.MEMORY_ONLY).map(_._2) val streams =

Re: converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid
Thanks Dear, It is good to save this data to HDFS and then load back into an RDD :) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/converting-DStream-String-into-RDD-String-in-spark-streaming-tp20253p20258.html Sent from the Apache Spark User List mailing

Spark Streaming empty RDD issue

2014-12-03 Thread Hafiz Mujadid
Hi Experts I am using Spark Streaming to integrate Kafka for real time data processing. I am facing some issues related to Spark Streaming So I want to know how can we detect 1) Our connection has been lost 2) Our receiver is down 3) Spark Streaming has no new messages to consume. how can we deal