Re: Get filename in Spark Streaming

2015-02-06 Thread Subacini B
-file-name-in-map-td6551.html -- Emre Sevinç On Fri, Feb 6, 2015 at 2:16 AM, Subacini B subac...@gmail.com wrote: Hi All, We have filename with timestamp say ABC_1421893256000.txt and the timestamp needs to be extracted from file name for further processing.Is there a way to get input file

Get filename in Spark Streaming

2015-02-05 Thread Subacini B
Hi All, We have filename with timestamp say ABC_1421893256000.txt and the timestamp needs to be extracted from file name for further processing.Is there a way to get input file name picked up by spark streaming job? Thanks in advance Subacini

Improve performance using spark streaming + sparksql

2015-01-24 Thread Subacini B
Hi All, I have a cluster of 3 nodes [each 8 core/32 GB memory]. My program uses Spark Streaming with Spark SQL[Spark 1.1] and writes incoming JSON to elasticsearch, Hbase. Below is my code and i receive json files [input data varies from 30MB to 300 MB] every 10 seconds. Irrespective of 3

Re: SchemaRDD to Hbase

2014-12-20 Thread Subacini B
Hi , Can someone help me , Any pointers would help. Thanks Subacini On Fri, Dec 19, 2014 at 10:47 PM, Subacini B subac...@gmail.com wrote: Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks

SchemaRDD to Hbase

2014-12-19 Thread Subacini B
Hi All, Is there any API that can be used directly to write schemaRDD to HBase?? If not, what is the best way to write schemaRDD to HBase. Thanks Subacini

Processing multiple request in cluster

2014-09-24 Thread Subacini B
hi All, How to run concurrently multiple requests on same cluster. I have a program using *spark streaming context *which reads* streaming data* and writes it to HBase. It works fine, the problem is when multiple requests are submitted to cluster, only first request is processed as the entire

Re: Spark SQL - groupby

2014-07-03 Thread Subacini B
Hi, Can someone provide me pointers for this issue. Thanks Subacini On Wed, Jul 2, 2014 at 3:34 PM, Subacini B subac...@gmail.com wrote: Hi, Below code throws compilation error , not found: *value Sum* . Can someone help me on this. Do i need to add any jars or imports ? even for Count

Spark SQL - groupby

2014-07-02 Thread Subacini B
Hi, Below code throws compilation error , not found: *value Sum* . Can someone help me on this. Do i need to add any jars or imports ? even for Count , same error is thrown val queryResult = sql(select * from Table) queryResult.groupBy('colA)('colA,*Sum*('colB) as 'totB).aggregate(*Sum*

Shark Vs Spark SQL

2014-07-02 Thread Subacini B
Hi, http://mail-archives.apache.org/mod_mbox/spark-user/201403.mbox/%3cb75376b8-7a57-4161-b604-f919886cf...@gmail.com%3E This talks about Shark backend will be replaced with Spark SQL engine in future. Does that mean Spark will continue to support Shark + Spark SQL for long term? OR After some

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
HI, I am stuck here, my cluster is not effficiently utilized . Appreciate any input on this. Thanks Subacini On Sat, Jun 7, 2014 at 10:54 PM, Subacini B subac...@gmail.com wrote: Hi All, My cluster has 5 workers each having 4 cores (So total 20 cores).It is in stand alone mode (not using

Re: Spark Worker Core Allocation

2014-06-08 Thread Subacini B
resource from a few nodes. On Jun 8, 2014 1:55 AM, Subacini B subac...@gmail.com wrote: Hi All, My cluster has 5 workers each having 4 cores (So total 20 cores).It is in stand alone mode (not using Mesos or Yarn).I want two programs to run at same time. So I have configured spark.cores.max=3