Re: MLib : Non Linear Optimization

2016-09-07 Thread nsareen
Any answer to this question group ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/MLib-Non-Linear-Optimization-tp27645p27676.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

MLib : Non Linear Optimization

2016-09-01 Thread nsareen
I'm part of an Predictive Analytics marketing platform. We do a lot of Optimizations ( non linear ), currently using SAS / Lindo routines. I was going through Spark's MLib documentation & found it supports Linear Optimization, was wondering if it also supports Non Linear Optimization & if not, are

input size too large | Performance issues with Spark

2015-03-28 Thread nsareen
Hi All, I'm facing performance issues with spark implementation, and was briefly investigating on WebUI logs, i noticed that my RDD size is 55GB the Shuffle Write is 10 GB Input Size is 200GB. Application is a web application which does predictive analytics, so we keep most of our data in

Re: Does filter on an RDD scan every data item ?

2014-12-15 Thread nsareen
Thanks! shall try it out. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20683.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Does filter on an RDD scan every data item ?

2014-12-05 Thread nsareen
Any thoughts, how could Spark SQL help in our scenario ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20465.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Does filter on an RDD scan every data item ?

2014-12-04 Thread nsareen
Thanks for the reply! To be honest, I was expecting spark to have some sort of Indexing for keys, which would help it locate the keys efficiently. I wasn't using Spark SQL here, but if it helps perform this efficiently, i can try it out, can you please elaborate, how will it be helpful in this

Re: Does filter on an RDD scan every data item ?

2014-12-04 Thread nsareen
I'm not sure sample is what i was looking for. As mentioned in another post above. this is what i'm looking for. 1) My RDD contains this structure. Tuple2CustomTuple,Double. 2) Each CustomTuple is a combination of string id's e.g. CustomTuple.dimensionOne=AE232323

Does filter on an RDD scan every data item ?

2014-12-02 Thread nsareen
Hi , I wanted some clarity into the functioning of Filter function of RDD. 1) Does filter function scan every element saved in RDD? if my RDD represents 10 Million rows, and if i want to work on only 1000 of them, is there an efficient way of filtering the subset without having to scan every

Re: Calling spark from a java web application.

2014-12-02 Thread nsareen
We have a web application which talks to spark server. This is how we have done the integration. 1) In the tomcat's classpath, add the spark distribution jar for spark code to be available at runtime ( for you it would be Jetty). 2) In the Web application project, add the spark distribution jar

RDD Action require data from Another RDD

2014-11-20 Thread nsareen
Hi, We have a requirement, where we have two data sets represented by RDD's RDDA RDDB. For performing an aggregation operation on RDDA, the action would need RDDB's subset of data, wanted to understand if there is a best practice in doing this ? Dont even know how will this be possible as of

Re: Efficient Key Structure in pairRDD

2014-11-11 Thread nsareen
Spark Dev / Users, help in this regard would be appreciated, we are kind of stuck at this point. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-Key-Structure-in-pairRDD-tp18461p18557.html Sent from the Apache Spark User List mailing list archive

Efficient Key Structure in pairRDD

2014-11-09 Thread nsareen
Hi, We are trying to adopt Spark for our application. We have an analytical application which stores data in Star Schemas ( SQL Server ). All the cubes are loaded into a Key / Value structure and saved in Trove ( in memory collection ). here key is a short array where each short number

Re: Task size variation while using Range Vs List

2014-11-06 Thread nsareen
Thanks for the response!! Will try to see the behaviour with Cache() -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243p18318.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Task size variation while using Range Vs List

2014-11-05 Thread nsareen
I noticed a behaviour where it was observed that, if i'm using val temp = sc.parallelize ( 1 to 10) temp.collect Task size will be in bytes let's say ( 1120 bytes). But if i change this to a for loop import scala.collection.mutable.ArrayBuffer val data= new ArrayBuffer[Integer]() for(i -

Re: How to trace/debug serialization?

2014-11-05 Thread nsareen
From what i've observed, there are no debug logs while serialization takes place. You can see the source code if you want, TaskSetManager class has some functions for serialization. -- View this message in context:

Task Size Increases when using loops

2014-10-29 Thread nsareen
Hi,I'm new to spark, and am facing a peculiar problem. I'm writing a simple Java Driver program where i'm creating Key / Value data structure and collecting them, once created. The problem i'm facing is that, when i increase the iterations of a for loop which creates the ArrayList of Long Values

Spark Concepts

2014-10-15 Thread nsareen
Hi ,I'm pretty new to Big Data Spark both. I've just started POC work on spark and me my team are evaluating it with other In Memory computing tools such as GridGain, Bigmemory, Aerospike some others too, specifically to solve two sets of problems.1) Data Storage : Our current application runs

Re: Spark Concepts

2014-10-15 Thread nsareen
Anybody with good hands on with Spark, please do reply. It would help us a lot!! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Concepts-tp16477p16536.html Sent from the Apache Spark User List mailing list archive at Nabble.com.