reading files with .list extension

2016-10-15 Thread Hafiz Mujadid
hi,


I want to load the files in apache-spark with .list extensions as
actors.list.gz here . Can
anybody please suggest me the Hadoop input format for such kind of files?


Thanks


State management in spark-streaming

2015-12-06 Thread Hafiz Mujadid
Hi,

I have spark streaming with mqtt as my source. There are continuous events
of flame sensors i.e. Fire and no Fire. I want to generate Fire event when
the newly event is for Fire and want to ignore all subsequent event until No
fire event is happened. Similarly If i get No-Fire Event i will ignore all
subsequent No-Fire events and will wait for Fire Events. 

Is it possible to handle this logic in custom receiver? If yes how can we
remember last state i.e. Fire/No-Fire in receiver.


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/State-management-in-spark-streaming-tp25608.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



writing to hive

2015-10-13 Thread Hafiz Mujadid
hi!

I am following  this
 
 
tutorial to read and write from hive. But i am facing following exception
when i run the code.

15/10/12 14:57:36 INFO storage.BlockManagerMaster: Registered BlockManager
15/10/12 14:57:38 INFO scheduler.EventLoggingListener: Logging events to
hdfs://host:9000/spark/logs/local-1444676256555
Exception in thread "main" java.lang.VerifyError: Bad return type
Exception Details:
  Location:
   
org/apache/spark/sql/catalyst/expressions/Pmod.inputType()Lorg/apache/spark/sql/types/AbstractDataType;
@3: areturn
  Reason:
Type 'org/apache/spark/sql/types/NumericType$' (current frame, stack[0])
is not assignable to 'org/apache/spark/sql/types/AbstractDataType' (from
method signature)
  Current Frame:
bci: @3
flags: { }
locals: { 'org/apache/spark/sql/catalyst/expressions/Pmod' }
stack: { 'org/apache/spark/sql/types/NumericType$' }
  Bytecode:
000: b200 63b0

at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2595)
at java.lang.Class.getConstructor0(Class.java:2895)
at java.lang.Class.getDeclaredConstructor(Class.java:2066)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$4.apply(FunctionRegistry.scala:267)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$$anonfun$4.apply(FunctionRegistry.scala:267)
at scala.util.Try$.apply(Try.scala:161)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.expression(FunctionRegistry.scala:267)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.(FunctionRegistry.scala:148)
at
org.apache.spark.sql.catalyst.analysis.FunctionRegistry$.(FunctionRegistry.scala)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:414)
at
org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:413)
at
org.apache.spark.sql.UDFRegistration.(UDFRegistration.scala:39)
at org.apache.spark.sql.SQLContext.(SQLContext.scala:203)
at
org.apache.spark.sql.hive.HiveContext.(HiveContext.scala:72)


Is there any suggestion how to read and write in hive?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/writing-to-hive-tp25046.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



read from hive tables and write back to hive

2015-10-12 Thread Hafiz Mujadid
Hi!

How can i read/write data from/to hive?
Is it necessary to compile spark with hive profile to interact with hive?
which maven dependencies are required to interact with hive?

i could not find a well documentation to follow step by step to get working
with hive.

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/read-from-hive-tables-and-write-back-to-hive-tp25028.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Save dataframe into hbase

2015-09-02 Thread Hafiz Mujadid
Hi 

What is the efficient way to save Dataframe into hbase?

Thanks






--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Save-dataframe-into-hbase-tp24552.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



wild cards in spark sql

2015-09-02 Thread Hafiz Mujadid
Hi

does spark sql support wild cards to filter data in sql queries just like we
can filter data in sql queries in RDBMS with different wild cards like % and
? etc. In other words how can i write following query in spar sql

select * from employee where ename like 'a%d'

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/wild-cards-in-spark-sql-tp24563.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Schema From parquet file

2015-09-01 Thread Hafiz Mujadid
Hi all!

Is there any way to get schema from a parquet file without loading into
dataframe?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Schema-From-parquet-file-tp24535.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



reading multiple parquet file using spark sql

2015-09-01 Thread Hafiz Mujadid
Hi 

I want to read multiple parquet files using spark sql load method. just like
we can pass multiple comma separated path to sc.textfile method. Is ther
anyway to do the same ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-multiple-parquet-file-using-spark-sql-tp24537.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Writing test case for spark streaming checkpointing

2015-08-27 Thread Hafiz Mujadid
Hi!

I have enables check pointing in spark streaming with kafka. I can see that
spark streaming is checkpointing to the mentioned directory at hdfs. How can
i test that it works fine and recover with no data loss ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Writing-test-case-for-spark-streaming-checkpointing-tp24475.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



giving offset in spark sql

2015-08-04 Thread Hafiz Mujadid
Hi all!

I want to skip first n rows from a dataframe? This is done in normal sql
using offset keyword. How can we achieve in spark sql?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/giving-offset-in-spark-sql-tp24130.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Custom partitioner

2015-07-26 Thread Hafiz Mujadid
Hi

I have csv data in which i have a column of date time. I want to partition
my data in 12 partitions with each partition containing data of one month
only. I am not getting how to write such partitioner and how to use that
partitioner to read write data.  

Kindly help me in this regard.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Custom-partitioner-tp24001.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException in spark with mysql database

2015-07-06 Thread Hafiz Mujadid
Hi!
I am trying to load data from my sql database using following code

val query=select * from  +table+  
val url = jdbc:mysql:// + dataBaseHost + : + dataBasePort + / +
dataBaseName + ?user= + db_user + password= + db_pass
val sc = new SparkContext(new
SparkConf().setAppName(SparkJdbcDs).setMaster(local[*]))
val sqlContext = new SQLContext(sc)
val options = new HashMap[String, String]()
options.put(driver, com.mysql.jdbc.Driver)
options.put(url, url)
options.put(dbtable, query)
options.put(numPartitions, 1)
sqlContext.load(jdbc, options)

And I get following exception

Exception in thread main
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error
in your SQL syntax; check the manual that corresponds to your MySQL server
version for the right syntax to use near 'select * from  tempTable   WHERE
1=0'



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/com-mysql-jdbc-exceptions-jdbc4-MySQLSyntaxErrorException-in-spark-with-mysql-database-tp23643.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Converting spark JDBCRDD to DataFrame

2015-07-06 Thread Hafiz Mujadid
Hi all!

what is the most efficient way to convert jdbcRDD to DataFrame.

any example?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Converting-spark-JDBCRDD-to-DataFrame-tp23647.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid
Hi all!

I am trying to read records from offset 100 to 110 from a table using
following piece of code

val sc = new SparkContext(new
SparkConf().setAppName(SparkJdbcDs).setMaster(local[*]))
val sqlContext = new SQLContext(sc)
val options = new HashMap[String, String]()
options.put(driver, com.mysql.jdbc.Driver)
options.put(url,
jdbc:mysql://***:3306/temp?user=password=)
options.put(dbtable, tempTable)
options.put(lowerBound, 100)
options.put(upperBound, 110)
options.put(numPartitions, 1)
sqlContext.load(jdbc, options)


but this returns all the records instead of only 10 records




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: lower and upper offset not working in spark with mysql database

2015-07-05 Thread Hafiz Mujadid
thanks

On Mon, Jul 6, 2015 at 10:46 AM, Manohar753 [via Apache Spark User List] 
ml-node+s1001560n23637...@n3.nabble.com wrote:

  I think you should mention partitionColumn like below and the Colum type
 should be numeric. It works for my case.



 options.put(partitionColumn, revision);





 Thanks,

 Manohar





 *From:* Hafiz Mujadid [via Apache Spark User List] [mailto:ml-node+[hidden
 email] http:///user/SendEmail.jtp?type=nodenode=23637i=0]
 *Sent:* Monday, July 6, 2015 10:56 AM
 *To:* Manohar Reddy
 *Subject:* lower and upper offset not working in spark with mysql database



 Hi all!

 I am trying to read records from offset 100 to 110 from a table using
 following piece of code

 val sc = new SparkContext(new
 SparkConf().setAppName(SparkJdbcDs).setMaster(local[*]))
 val sqlContext = new SQLContext(sc)
 val options = new HashMap[String, String]()
 options.put(driver, com.mysql.jdbc.Driver)
 options.put(url,
 jdbc:mysql://***:3306/temp?user=password=)
 options.put(dbtable, tempTable)
 options.put(lowerBound, 100)
 options.put(upperBound, 110)
 options.put(numPartitions, 1)
 sqlContext.load(jdbc, options)


 but this returns all the records instead of only 10 records

  --

 *If you reply to this email, your message will be added to the discussion
 below:*


 http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635.html

 To start a new topic under Apache Spark User List, email [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=23637i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
   --
 Happiest Minds Disclaimer

 This message is for the sole use of the intended recipient(s) and may
 contain confidential, proprietary or legally privileged information. Any
 unauthorized review, use, disclosure or distribution is prohibited. If you
 are not the original intended recipient of the message, please contact the
 sender by reply email and destroy all copies of the original message.
 Happiest Minds Technologies http://www.happiestminds.com

 --


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635p23637.html
  To unsubscribe from lower and upper offset not working in spark with
 mysql database, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=23635code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIzNjM1fC05MjEzOTMxMTE=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/lower-and-upper-offset-not-working-in-spark-with-mysql-database-tp23635p23638.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: making dataframe for different types using spark-csv

2015-07-02 Thread Hafiz Mujadid
Thanks

On Thu, Jul 2, 2015 at 5:40 PM, Kohler, Curt E (ELS-STL) 
c.koh...@elsevier.com wrote:

  You should be able to do something like this (assuming an input file
 formatted as:  String, IntVal, LongVal)


  import org.apache.spark.sql.types._

  val recSchema = StructType(List(StructField(“strVal, StringType, false),
 StructField(“intVal, IntegerType,
 false),
 StructField(“longVal, LongType, false)))

  val filePath = “some path to your dataset

  val df1 =  sqlContext.load(com.databricks.spark.csv, recSchema,
 Map(path - filePath , header - false, delimiter - ,, mode -
 FAILFAST))

   From: Hafiz Mujadid hafizmujadi...@gmail.com
 Date: Wednesday, July 1, 2015 at 10:59 PM
 To: Mohammed Guller moham...@glassbeam.com
 Cc: Krishna Sankar ksanka...@gmail.com, user@spark.apache.org 
 user@spark.apache.org

 Subject: Re: making dataframe for different types using spark-csv

   hi Mohammed Guller!

  How can I specify schema in load method?



 On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com
 wrote:

  Another option is to provide the schema to the load method. One variant
 of the sqlContext.load takes a schema as a input parameter. You can define
 the schema programmatically as shown here:




 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema



 Mohammed



 *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
 *Sent:* Wednesday, July 1, 2015 3:09 PM
 *To:* Hafiz Mujadid
 *Cc:* user@spark.apache.org
 *Subject:* Re: making dataframe for different types using spark-csv



 ·  use .cast(...).alias('...') after the DataFrame is read.

 ·  sql.functions.udf for any domain-specific conversions.

 Cheers

 k/



 On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com
 wrote:

 Hi experts!


 I am using spark-csv to lead csv data into dataframe. By default it makes
 type of each column as string. Is there some way to get dataframe of
 actual
 types like int,double etc.?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






  --
 Regards: HAFIZ MUJADID




-- 
Regards: HAFIZ MUJADID


making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
Hi experts!


I am using spark-csv to lead csv data into dataframe. By default it makes
type of each column as string. Is there some way to get dataframe of actual
types like int,double etc.?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



coalesce on dataFrame

2015-07-01 Thread Hafiz Mujadid
How can we use coalesce(1, true) on dataFrame?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-on-dataFrame-tp23564.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: making dataframe for different types using spark-csv

2015-07-01 Thread Hafiz Mujadid
hi Mohammed Guller!

How can I specify schema in load method?



On Thu, Jul 2, 2015 at 6:43 AM, Mohammed Guller moham...@glassbeam.com
wrote:

  Another option is to provide the schema to the load method. One variant
 of the sqlContext.load takes a schema as a input parameter. You can define
 the schema programmatically as shown here:




 https://spark.apache.org/docs/latest/sql-programming-guide.html#programmatically-specifying-the-schema



 Mohammed



 *From:* Krishna Sankar [mailto:ksanka...@gmail.com]
 *Sent:* Wednesday, July 1, 2015 3:09 PM
 *To:* Hafiz Mujadid
 *Cc:* user@spark.apache.org
 *Subject:* Re: making dataframe for different types using spark-csv



 ·  use .cast(...).alias('...') after the DataFrame is read.

 ·  sql.functions.udf for any domain-specific conversions.

 Cheers

 k/



 On Wed, Jul 1, 2015 at 11:03 AM, Hafiz Mujadid hafizmujadi...@gmail.com
 wrote:

 Hi experts!


 I am using spark-csv to lead csv data into dataframe. By default it makes
 type of each column as string. Is there some way to get dataframe of actual
 types like int,double etc.?


 Thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/making-dataframe-for-different-types-using-spark-csv-tp23570.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






-- 
Regards: HAFIZ MUJADID


flume sinks supported by spark streaming

2015-06-23 Thread Hafiz Mujadid
Hi!


I want to integrate flume with spark streaming. I want to know which sink
type of flume are supported by spark streaming? I saw one example using
avroSink.

Thanks





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/flume-sinks-supported-by-spark-streaming-tp23462.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



cassandra with jdbcRDD

2015-06-16 Thread Hafiz Mujadid
hi all!


is there a way to connect cassandra with jdbcRDD ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/cassandra-with-jdbcRDD-tp23335.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



redshift spark

2015-06-05 Thread Hafiz Mujadid
Hi All,

I want to read and write data to aws redshift. I found spark-redshift
project at following address.
https://github.com/databricks/spark-redshift

in its documentation there is following code is written. 
import com.databricks.spark.redshift.RedshiftInputFormat

val records = sc.newAPIHadoopFile(
  path,
  classOf[RedshiftInputFormat],
  classOf[java.lang.Long],
  classOf[Array[String]])

I am unable to understand it's parameters. Can somebody explain how to use
this? what is meant by path in this case?

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/redshift-spark-tp23175.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



setting spark configuration properties problem

2015-05-05 Thread Hafiz Mujadid
Hi all,

i have declared spark context at start of my program and then i want to
change it's configurations at some later stage in my code as written below

val conf = new SparkConf().setAppName(Cassandra Demo)
var sc:SparkContext=new SparkContext(conf)
sc.getConf.set(spark.cassandra.connection.host, host)
println(sc.getConf.get(spark.cassandra.connection.host))

in the above code following exception occur

Exception in thread main java.util.NoSuchElementException:
spark.cassandra.connection.host

Actually I want to set cassandra host property and i just want to set this
if there is some job related to cassandra so i declare sparkcontext earlier
and then want to set this property at some later stage.

Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/setting-spark-configuration-properties-problem-tp22764.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



empty jdbc RDD in spark

2015-05-02 Thread Hafiz Mujadid
Hi all!
I am trying to read hana database using spark jdbc RDD
here is my code
def readFromHana() {
val conf = new SparkConf()
conf.setAppName(test).setMaster(local)
val sc = new SparkContext(conf)
val rdd = new JdbcRDD(sc, () = {
  Class.forName(com.sap.db.jdbc.Driver).newInstance()
 
DriverManager.getConnection(jdbc:sap://54.69.200.113:30015/?currentschema=LIVE2,
mujadid, 786Xyz123)
},
  SELECT *  FROM MEMBERS LIMIT ? OFFSET  ?,
  0, 100, 1,
  (r: ResultSet) =  convert(r) )
println(rdd.count());
sc.stop()
  }
  def convert(rs: ResultSet):String={
  val rsmd = rs.getMetaData()
  val numberOfColumns = rsmd.getColumnCount()
  var i = 1
  val row=new StringBuilder
  while (i = numberOfColumns) {
row.append( rs.getString(i)+,)
i += 1
  }
  row.toString()
   }

The resultant count is 0

Any suggestion?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/empty-jdbc-RDD-in-spark-tp22736.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



sap hana database laod using jdbcRDD

2015-04-30 Thread Hafiz Mujadid
Hi !

Can we load hana database table using spark jdbc RDD?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/sap-hana-database-laod-using-jdbcRDD-tp22726.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



saving schemaRDD to cassandra

2015-03-27 Thread Hafiz Mujadid
Hi experts!

I would like to know is there anyway to store schemaRDD to cassandra?
if yes then how to store in existing cassandra column family and new column
family?

Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/saving-schemaRDD-to-cassandra-tp22256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Downloading data from url

2015-03-17 Thread Hafiz Mujadid
Hi experts!


Is there any api in spark to download data from url? I want to download data
from url in a spark application. I want to get downloading on all nodes
instead of a single node.

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Downloading-data-from-url-tp22102.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



connecting spark application with SAP hana

2015-03-12 Thread Hafiz Mujadid
Hi experts!

Is there any way to connect SAP hana in spark application and get data from
hana tables in our spark application?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/connecting-spark-application-with-SAP-hana-tp22011.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark streaming, batchinterval,windowinterval and window sliding interval difference

2015-02-26 Thread Hafiz Mujadid
Can somebody explain the difference between 
batchinterval,windowinterval and window sliding interval with example.
If there is any real time use case of using these parameters?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-batchinterval-windowinterval-and-window-sliding-interval-difference-tp21829.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



running spark project using java -cp command

2015-02-09 Thread Hafiz Mujadid
hi experts!

Is there any way to run spark application using java -cp command ?


thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/running-spark-project-using-java-cp-command-tp21567.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



using spark in web services

2015-02-09 Thread Hafiz Mujadid
Hi experts! I am trying to use spark in my restful webservices.I am using
scala lift frramework for writing web services. Here is my boot class
class Boot extends Bootable {
  def boot {
Constants.loadConfiguration
val sc=new SparkContext(new
SparkConf().setMaster(local).setAppName(services))
// Binding Service as a Restful API
LiftRules.statelessDispatchTable.append(RestfulService);
// resolve the trailing slash issue
LiftRules.statelessRewrite.prepend({
  case RewriteRequest(ParsePath(path, _, _, true), _, _) if path.last ==
index = RewriteResponse(path.init)
})

  }
}


When i remove this line val sc=new SparkContext(new
SparkConf().setMaster(local).setAppName(services))

then it works fine. 
I am starting services using command 

java -jar start.jar jetty.port=

and get following exception 


ERROR net.liftweb.http.provider.HTTPProvider - Failed to Boot! Your
application may not run properly
java.lang.NoClassDefFoundError:
org/eclipse/jetty/server/handler/ContextHandler$NoContext
at
org.eclipse.jetty.servlet.ServletContextHandler.newServletHandler(ServletContextHandler.java:260)
at
org.eclipse.jetty.servlet.ServletContextHandler.getServletHandler(ServletContextHandler.java:322)
at
org.eclipse.jetty.servlet.ServletContextHandler.relinkHandlers(ServletContextHandler.java:198)
at
org.eclipse.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:157)
at
org.eclipse.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:135)
at
org.eclipse.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:129)
at
org.eclipse.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:99)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87)
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67)
at
org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:60)
at
org.apache.spark.ui.WebUI$$anonfun$attachTab$1.apply(WebUI.scala:60)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.ui.WebUI.attachTab(WebUI.scala:60)
at org.apache.spark.ui.SparkUI.initialize(SparkUI.scala:50)
at org.apache.spark.ui.SparkUI.init(SparkUI.scala:63)



Any suggestion please?

Am I using right command to run this ?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/using-spark-in-web-services-tp21550.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



LeaseExpiredException while writing schemardd to hdfs

2015-02-03 Thread Hafiz Mujadid
I want to write whole schemardd to single in hdfs but facing following
exception

rg.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /test/data/data1.csv (inode 402042): File does not exist. Holder
DFSClient_NONMAPREDUCE_-564238432_57 does not have any open files 

here is my code 
rdd.foreachPartition( iterator = {
  var output = new Path( outputpath )
  val fs = FileSystem.get( new Configuration() )
  var writer : BufferedWriter = null
  writer = new BufferedWriter( new OutputStreamWriter(  fs.create(
output ) ) ) 
  var line = new StringBuilder
  iterator.foreach( row = {
row.foreach( column = {
line.append( column.toString + splitter )
} )
writer.write( line.toString.dropRight( 1 ) )
writer.newLine()
line.clear
} )
writer.close()
} )

I think problem is that I am making writer for each partition and multiple
writer are executing in parallel so when they try to write to same file then
this problem appears. 
When I avoid this approach then I face task not serializable exception

Any suggest to handle this problem?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/LeaseExpiredException-while-writing-schemardd-to-hdfs-tp21477.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Linkage error - duplicate class definition

2015-01-20 Thread Hafiz Mujadid
Have you solved this problem?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Linkage-error-duplicate-class-definition-tp9482p21260.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



spark streaming kinesis issue

2015-01-20 Thread Hafiz Mujadid
Hi experts!

I am using spark streaming with kinesis and getting this exception while
running program 

 java.lang.LinkageError: loader (instance of 
org/apache/spark/executor/ChildExecutorURLClassLoader$userClassLoader$):
attempted  duplicate class definition for name:
com/amazonaws/services/kinesis/clientlibrary/lib/worker/InitialPositionInStream
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)



Is there any solution to this problem?





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-kinesis-issue-tp21262.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain

2015-01-19 Thread Hafiz Mujadid
Hi all!

I am trying to use kinesis and spark streaming together. So when I execute
program I get exception com.amazonaws.AmazonClientException: Unable to load
AWS credentials from any provider in the chain


Here is my piece of code

val credentials = new
BasicAWSCredentials(KinesisProperties.AWS_ACCESS_KEY_ID,
KinesisProperties.AWS_SECRET_KEY)

var kinesisClient: AmazonKinesisClient = new
AmazonKinesisClient(credentials)


kinesisClient.setEndpoint(KinesisProperties.KINESIS_ENDPOINT_URL,
KinesisProperties.KINESIS_SERVICE_NAME,
KinesisProperties.KINESIS_REGION_ID)
System.setProperty(aws.accessKeyId, 
KinesisProperties.AWS_ACCESS_KEY_ID)
System.setProperty(aws.secretKey, 
KinesisProperties.AWS_SECRET_KEY)
System.setProperty(AWS_ACCESS_KEY_ID,
KinesisProperties.AWS_ACCESS_KEY_ID)
System.setProperty(AWS_SECRET_KEY, 
KinesisProperties.AWS_SECRET_KEY)
val numShards =
kinesisClient.describeStream(KinesisProperties.MY_STREAM_NAME)
.getStreamDescription().getShards().size()
val numStreams = numShards
val ssc = StreamingHelper.getStreamingInstance(new
Duration(KinesisProperties.KINESIS_CHECKPOINT_INTERVAL))
ssc.addStreamingListener(new MyStreamListener)
val kinesisStreams = (0 until numStreams).map { i =
KinesisUtils.createStream(ssc, 
KinesisProperties.MY_STREAM_NAME,
KinesisProperties.KINESIS_ENDPOINT_URL,
new 
Duration(KinesisProperties.KINESIS_CHECKPOINT_INTERVAL),
InitialPositionInStream.TRIM_HORIZON,
StorageLevel.MEMORY_AND_DISK_2)
}
/* Union all the streams */
val unionStreams = ssc.union(kinesisStreams)
val tmp_stream = unionStreams.map(byteArray = new 
String(byteArray))
val 
data=tmp_stream.window(Seconds(KinesisProperties.WINDOW_INTERVAL ),
Seconds(KinesisProperties.SLIDING_INTERVAL))
data.foreachRDD((rdd: RDD[String], time: Time) = {
if (rdd.take(1).size == 1) {
rdd.saveAsTextFile(KinesisProperties.Sink + 
time.milliseconds)
}
})
ssc.start()
ssc.awaitTermination()



Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/com-amazonaws-AmazonClientException-Unable-to-load-AWS-credentials-from-any-provider-in-the-chain-tp21255.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



kinesis multiple records adding into stream

2015-01-16 Thread Hafiz Mujadid
Hi Experts!

I am using kinesis dependency as follow
groupId = org.apache.spark
 artifactId = spark-streaming-kinesis-asl_2.10
 version = 1.2.0

in this aws sdk version 1.8.3 is being used. in this sdk multiple records
can not be put in a single request. is it possible to put multiple records
in a single request ? 


thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kinesis-multiple-records-adding-into-stream-tp21191.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid
thanks

On Thu, Jan 15, 2015 at 7:35 PM, Prannoy [via Apache Spark User List] 
ml-node+s1001560n21163...@n3.nabble.com wrote:

 Hi,

 You can take the schema line in another rdd and than do a union of the two
 rdd .

 ListString schemaList = new ArrayListString;
 schemaList.add(xyz);

 // where xyz is your schema line

 JavaRDD schemaRDDString = sc.parallize(schemaList) ;

 //where sc is your sparkcontext

  JavaRDD newRDDString = schemaRDD.union(yourRDD);

 // where yourRDD is your another rdd starting of which you want to add the
 schema line.

 The code is in java, you can change it to scala 

 Thanks.





 On Thu, Jan 15, 2015 at 7:46 PM, Hafiz Mujadid [via Apache Spark User
 List] [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=21163i=0 wrote:

 hi experts!

 I hav an RDD[String] and i want to add schema line at beginning in this
 rdd. I know RDD is immutable. So is there anyway to have a new rdd with one
 schema line and contents of previous rdd?


 Thanks

 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161.html
  To start a new topic under Apache Spark User List, email [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=21163i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161p21163.html
  To unsubscribe from Inserting an element in RDD[String], click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=21161code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIxMTYxfC05MjEzOTMxMTE=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161p21165.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

kinesis creating stream scala code exception

2015-01-15 Thread Hafiz Mujadid
Hi, Expert I want to consumes data from kinesis stream using spark streaming.
I am trying to  create kinesis stream using scala code. Here is my code

def main(args: Array[String]) {
println(Stream creation started)
if(create(2))
println(Stream is created successfully)
}
def create(shardCount: Int): Boolean = {
val credentials = new
BasicAWSCredentials(KinesisProperties.AWS_ACCESS_KEY_ID,
KinesisProperties.AWS_SECRET_KEY)

var kinesisClient: AmazonKinesisClient = new
AmazonKinesisClient(credentials)
kinesisClient.setEndpoint(KinesisProperties.KINESIS_ENDPOINT_URL,
KinesisProperties.KINESIS_SERVICE_NAME,
KinesisProperties.KINESIS_REGION_ID)
val createStreamRequest = new CreateStreamRequest()
createStreamRequest.setStreamName(KinesisProperties.myStreamName);
createStreamRequest.setShardCount(shardCount)
val describeStreamRequest = new DescribeStreamRequest()
describeStreamRequest.setStreamName(KinesisProperties.myStreamName)
try {
Thread.sleep(12)
} catch {
case e: Exception =
}
var streamStatus = not active
while (!streamStatus.equalsIgnoreCase(ACTIVE)) {
try {
Thread.sleep(1000)
} catch {
case e: Exception = e.printStackTrace()
}
try {
val describeStreamResponse =
kinesisClient.describeStream(describeStreamRequest)
streamStatus =
describeStreamResponse.getStreamDescription.getStreamStatus
} catch {
case e: Exception = e.printStackTrace()
}
}
if (streamStatus.equalsIgnoreCase(ACTIVE))
true
else
false
}


When I run this code I get following exception

Exception in thread main java.lang.NoSuchMethodError:
org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter;
at com.amazonaws.auth.AWS4Signer.clinit(AWS4Signer.java:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at java.lang.Class.newInstance(Class.java:379)
at
com.amazonaws.auth.SignerFactory.createSigner(SignerFactory.java:119)
at
com.amazonaws.auth.SignerFactory.lookupAndCreateSigner(SignerFactory.java:105)
at com.amazonaws.auth.SignerFactory.getSigner(SignerFactory.java:78)
at
com.amazonaws.AmazonWebServiceClient.computeSignerByServiceRegion(AmazonWebServiceClient.java:307)
at
com.amazonaws.AmazonWebServiceClient.computeSignerByURI(AmazonWebServiceClient.java:280)
at
com.amazonaws.AmazonWebServiceClient.setEndpoint(AmazonWebServiceClient.java:160)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.setEndpoint(AmazonKinesisClient.java:2102)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:216)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:139)
at
com.amazonaws.services.kinesis.AmazonKinesisClient.init(AmazonKinesisClient.java:116)
at
com.platalytics.platform.connectors.Kinesis.App$.create(App.scala:32)
at
com.platalytics.platform.connectors.Kinesis.App$.main(App.scala:26)
at com.platalytics.platform.connectors.Kinesis.App.main(App.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)



I have following maven dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-streaming-kinesis-asl_2.10/artifactId
version1.2.0/version
/dependency 


Any suggestion?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/kinesis-creating-stream-scala-code-exception-tp21154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Inserting an element in RDD[String]

2015-01-15 Thread Hafiz Mujadid
hi experts!

I hav an RDD[String] and i want to add schema line at beginning in this rdd.
I know RDD is immutable. So is there anyway to have a new rdd with one
schema line and contents of previous rdd?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Inserting-an-element-in-RDD-String-tp21161.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid
Actually this code is producing error leader not found exception. I am
unable to find the reason

On Mon, Jan 12, 2015 at 4:03 PM, kevinkim [via Apache Spark User List] 
ml-node+s1001560n21098...@n3.nabble.com wrote:

 Well, you can use coalesce() to decrease number of partition to 1.
 (It will take time and quite not efficient, tough)

 Regards,
 Kevin.

 On Mon Jan 12 2015 at 7:57:39 PM Hafiz Mujadid [via Apache Spark User
 List] [hidden email]
 http:///user/SendEmail.jtp?type=nodenode=21098i=0 wrote:

 Hi experts!


 I have a schemaRDD of messages to be pushed in kafka. So I am using
 following piece of code to do that

 rdd.foreachPartition(itr = {
 val props = new Properties()
 props.put(metadata.broker.list,
 brokersList)
 props.put(serializer.class,
 kafka.serializer.StringEncoder)
 props.put(compression.codec,
 codec.toString)
 props.put(producer.type, sync)
 props.put(batch.num.messages,
 BatchSize.toString)
 props.put(message.send.max.retries,
 maxRetries.toString)
 props.put(request.required.acks, -1)
 producer = new Producer[String,
 String](new ProducerConfig(props))
 itr.foreach(row = {
 val msg =
 row.toString.drop(1).dropRight(1)
 this.synchronized {
 producer.send(new
 KeyedMessage[String, String](Topic, msg))
 }
 })
 producer.close
 })



 the problem with this code is that it creates kafka producer separate for
 each partition and I want a single producer for all partitions. Is there
 any way to achieve this?


 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097.html
  To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097p21098.html
  To unsubscribe from creating a single kafka producer object for all
 partitions, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=21097code=aGFmaXptdWphZGlkMDBAZ21haWwuY29tfDIxMDk3fC05MjEzOTMxMTE=
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-- 
Regards: HAFIZ MUJADID




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097p21099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

creating a single kafka producer object for all partitions

2015-01-12 Thread Hafiz Mujadid
Hi experts!


I have a schemaRDD of messages to be pushed in kafka. So I am using
following piece of code to do that

rdd.foreachPartition(itr = {
val props = new Properties()
props.put(metadata.broker.list, brokersList)
props.put(serializer.class, 
kafka.serializer.StringEncoder)
props.put(compression.codec, codec.toString)
props.put(producer.type, sync)
props.put(batch.num.messages, 
BatchSize.toString)
props.put(message.send.max.retries, 
maxRetries.toString)
props.put(request.required.acks, -1)
producer = new Producer[String, String](new 
ProducerConfig(props))
itr.foreach(row = {
val msg = 
row.toString.drop(1).dropRight(1)
this.synchronized {
producer.send(new 
KeyedMessage[String, String](Topic, msg))
}
})
producer.close
})



the problem with this code is that it creates kafka producer separate for
each partition and I want a single producer for all partitions. Is there any
way to achieve this?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/creating-a-single-kafka-producer-object-for-all-partitions-tp21097.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



skipping header from each file

2015-01-08 Thread Hafiz Mujadid
Suppose I give three files paths to spark context to read and each file has
schema in first row. how can we skip schema lines from headers


val rdd=sc.textFile(file1,file2,file3);



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/skipping-header-from-each-file-tp21051.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



max receiving rate in spark streaming

2015-01-07 Thread Hafiz Mujadid
Hi experts!


Is there any way to decide what can be effective receiving rate for kafka
spark streaming?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/max-receiving-rate-in-spark-streaming-tp21013.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



stopping streaming context

2015-01-05 Thread Hafiz Mujadid
Hi experts!

Please is there anyway to stop spark streaming context when 5 batches are
completed ?


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/stopping-streaming-context-tp20970.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



getting number of partition per topic in kafka

2015-01-03 Thread Hafiz Mujadid
Hi experts!

I am currently working on spark streaming with kafka. I have couple of
questions related to this task.

1) Is there a way to find number of partitions given a topic name?
2)Is there a way to detect whether kafka server is running or not ?


Thanks 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-number-of-partition-per-topic-in-kafka-tp20952.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4

2014-12-31 Thread Hafiz Mujadid
I am accessing hdfs with spark .textFile method. and I receive error as 

Exception in thread main org.apache.hadoop.ipc.RemoteException: Server IPC
version 9 cannot communicate with client version 4


here are my dependencies 
http://apache-spark-user-list.1001560.n3.nabble.com/file/n20925/Untitled.png 


Any suggestion ? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Exception-in-thread-main-org-apache-hadoop-ipc-RemoteException-Server-IPC-version-9-cannot-communica4-tp20925.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
hi dears!

Is there some efficient way to drop first line of an RDD[String]?

any suggestion?

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: removing first record from RDD[String]

2014-12-23 Thread Hafiz Mujadid
yep Michael Quinlan,it's working as suggested by Hoe Ren

thansk to you and Hoe Ren 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/removing-first-record-from-RDD-String-tp20834p20840.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



SchemaRDD to RDD[String]

2014-12-23 Thread Hafiz Mujadid
Hi dears!

I want to convert a schemaRDD into RDD of String. How can we do that?

Currently I am doing like this which is not converting correctly no
exception but resultant strings are empty


here is my code
def SchemaRDDToRDD( schemaRDD : SchemaRDD ) : RDD[ String ] = {
var types = schemaRDD.schema.fields.map( field = 
field.dataType )
var list = new ArrayList[ String ]()
types.foreach( Type = {
list.add( Type.toString )
} )
schemaRDD.map( row = rowToString( row, list ) )

}

private def rowToString( row : Row, list : ArrayList[ String ] ) : 
String =
{
var record = new StringBuilder
for ( i - 0 until row.length ) {
record.append( getValue( list.get( i ), row, i ) )
record.append( , )
}
var sub=record.setLength( record.length - 1 ).toString
println(sub)
sub
}
/** get a single value from row object
 *  @param datatype of value
 *  @param record of type Row
 *  @param i index of column in row object
 *  @return value in string
 */
private def getValue( dataType : String, record : Row, i : Int ) : 
String =
{
var res=
if ( dataType.equalsIgnoreCase( IntegerType ) )
res+=record.getInt( i )
else if ( dataType.equalsIgnoreCase( FloatType ) )
res+=record.getFloat( i )
else if ( dataType.equalsIgnoreCase( LongType ) )
res+=record.getLong( i )
else if ( dataType.equalsIgnoreCase( DoubleType ) )
res+=record.getDouble( i )
else if ( dataType.equalsIgnoreCase( TimestampType ) )
res+=record.getString( i )
else if ( dataType.equalsIgnoreCase( ShortType ) )
res+=record.getShort( i )
else if ( dataType.equalsIgnoreCase( ByteType ) )
res+=record.getByte( i )
else if ( dataType.equalsIgnoreCase( StringType ) )
res=record.getString( i )
else
res=record.getString( i )
println(res)
res
}







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-to-RDD-String-tp20846.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
Hi experts!

what is efficient way to read all files using spark from directory and its
sub-directories as well.currently i move all files from directory and it
sub-directories into another temporary directory and then read them all
using sc.textFile method. But I want a method so that moving to temporary
directory cost may be saved.

Thanks 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: reading files recursively using spark

2014-12-19 Thread Hafiz Mujadid
thanks bethesda!

But if we have structure like this

a/b/a.txt
a/c/c.txt
a/d/e/e.txt

then how can we handle this case?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/reading-files-recursively-using-spark-tp20782p20785.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Saving Data only if Dstream is not empty

2014-12-08 Thread Hafiz Mujadid
Hi Experts!

I want to save DStream to HDFS only if it is not empty such that it contains
some kafka messages to be stored. What is an efficient way to do this.

   var data = KafkaUtils.createStream[Array[Byte], Array[Byte],
DefaultDecoder, DefaultDecoder](ssc, params, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)


val streams = data.window(Seconds(interval*4),
Seconds(interval*2)).map(x = new String(x))
//streams.foreachRDD(rdd=rdd.foreach(println))

//what condition can be applied here to store only non empty DStream
streams.saveAsTextFiles(sink, msg)
Thanks




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Saving-Data-only-if-Dstream-is-not-empty-tp20587.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid
Hi!

does anybody has some useful example of StreamingListener interface. When
and how can we use this interface to stop streaming when one batch of data
is processed?

Thanks alot



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Example-usage-of-StreamingListener-tp20357.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: running Spark Streaming just once and stop it

2014-12-04 Thread Hafiz Mujadid
Hi Kal El!

Have you done stopping streaming after first iteration? if yes can you share
example code.


Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/running-Spark-Streaming-just-once-and-stop-it-tp1382p20359.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Example usage of StreamingListener

2014-12-04 Thread Hafiz Mujadid
Thanks Akhil  
You are so helping Dear. 




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Example-usage-of-StreamingListener-tp20357p20362.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid
Hi Experts!

Is there a way to read first N messages from kafka stream and put them in 
some collection and return to the caller for visualization purpose and close
spark streaming.

I will be glad to hear from you and will be thankful to you.

Currently I have following code that 

def getsample(params: scala.collection.immutable.Map[String, String]): Unit
= {
if (params.contains(zookeeperQourum))
  zkQuorum = params.get(zookeeperQourum).get
if (params.contains(userGroup))
  group = params.get(userGroup).get
if (params.contains(topics))
  topics = params.get(topics).get
if (params.contains(numberOfThreads))
  numThreads = params.get(numberOfThreads).get
if (params.contains(sink))
  sink = params.get(sink).get
if (params.contains(batchInterval))
  interval = params.get(batchInterval).get.toInt
val sparkConf = new
SparkConf().setAppName(KafkaConsumer).setMaster(spark://cloud2-server:7077)
val ssc = new StreamingContext(sparkConf, Seconds(interval))
val topicMap = topics.split(,).map((_, numThreads.toInt)).toMap
var consumerConfig = scala.collection.immutable.Map.empty[String,
String]
consumerConfig += (auto.offset.reset - smallest)
consumerConfig += (zookeeper.connect - zkQuorum)
consumerConfig += (group.id - group)
var data = KafkaUtils.createStream[Array[Byte], Array[Byte],
DefaultDecoder, DefaultDecoder](ssc, consumerConfig, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)
val streams = data.window(Seconds(interval), Seconds(interval)).map(x =
new String(x))
streams.foreach(rdd = rdd.foreachPartition(itr = {
  while (itr.hasNext  size = 0) {
var msg=itr.next
println(msg)
sample.append(msg)
sample.append(\n)
size -= 1
  }
}))
ssc.start()
ssc.awaitTermination(5000)
ssc.stop(true)
  }

Where sample is a StringBuilder, when I print the contents of this string
builder after getSample method call is returned. I got nothing in it.


Any help will be appreciated  



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-firs-N-messages-froma-Kafka-topic-using-Spark-Streaming-tp20227.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: getting firs N messages froma Kafka topic using Spark Streaming

2014-12-03 Thread Hafiz Mujadid
Hi Akhil!

Thanks for your response. Can you please suggest me how to return this
sample from a function to the caller and stopping SparkStreaming

Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/getting-firs-N-messages-froma-Kafka-topic-using-Spark-Streaming-tp20227p20249.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid
Hi everyOne!

I want to convert a  DStream[String] into an RDD[String]. I could not find
how to do this.  

var data = KafkaUtils.createStream[Array[Byte], Array[Byte], DefaultDecoder,
DefaultDecoder](ssc, consumerConfig, topicMap,
StorageLevel.MEMORY_ONLY).map(_._2)
val streams = data.window(Seconds(interval), Seconds(interval)).map(x =
new String(x))

Now I want to convert this streams into a single RDD[String].


Any help please.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/converting-DStream-String-into-RDD-String-in-spark-streaming-tp20253.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: converting DStream[String] into RDD[String] in spark streaming

2014-12-03 Thread Hafiz Mujadid
Thanks Dear, It is good to save this data to HDFS and then load back into an
RDD :)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/converting-DStream-String-into-RDD-String-in-spark-streaming-tp20253p20258.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Streaming empty RDD issue

2014-12-03 Thread Hafiz Mujadid
Hi Experts
I am using Spark Streaming to integrate Kafka for real time data processing.
I am facing some issues related to Spark Streaming
So I want to know how can we detect
1) Our connection has been lost
2) Our receiver is down
3) Spark Streaming has no new messages to consume.

how can we deal these issues?

I will be glad to hear from you and will be thankful to you.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-empty-RDD-issue-tp20329.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org