Any answer to this question group ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/MLib-Non-Linear-Optimization-tp27645p27676.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I'm part of an Predictive Analytics marketing platform. We do a lot of
Optimizations ( non linear ), currently using SAS / Lindo routines. I was
going through Spark's MLib documentation & found it supports Linear
Optimization, was wondering if it also supports Non Linear Optimization & if
not, are
Hi All,
I'm facing performance issues with spark implementation, and was briefly
investigating on WebUI logs, i noticed that my RDD size is 55GB the
Shuffle Write is 10 GB Input Size is 200GB. Application is a web
application which does predictive analytics, so we keep most of our data in
Thanks! shall try it out.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20683.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Any thoughts, how could Spark SQL help in our scenario ?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Does-filter-on-an-RDD-scan-every-data-item-tp20170p20465.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
Thanks for the reply!
To be honest, I was expecting spark to have some sort of Indexing for keys,
which would help it locate the keys efficiently.
I wasn't using Spark SQL here, but if it helps perform this efficiently, i
can try it out, can you please elaborate, how will it be helpful in this
I'm not sure sample is what i was looking for.
As mentioned in another post above. this is what i'm looking for.
1) My RDD contains this structure. Tuple2CustomTuple,Double.
2) Each CustomTuple is a combination of string id's e.g.
CustomTuple.dimensionOne=AE232323
Hi ,
I wanted some clarity into the functioning of Filter function of RDD.
1) Does filter function scan every element saved in RDD? if my RDD
represents 10 Million rows, and if i want to work on only 1000 of them, is
there an efficient way of filtering the subset without having to scan every
We have a web application which talks to spark server.
This is how we have done the integration.
1) In the tomcat's classpath, add the spark distribution jar for spark code
to be available at runtime ( for you it would be Jetty).
2) In the Web application project, add the spark distribution jar
Hi,
We have a requirement, where we have two data sets represented by RDD's
RDDA RDDB.
For performing an aggregation operation on RDDA, the action would need
RDDB's subset of data, wanted to understand if there is a best practice in
doing this ? Dont even know how will this be possible as of
Spark Dev / Users, help in this regard would be appreciated, we are kind of
stuck at this point.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-Key-Structure-in-pairRDD-tp18461p18557.html
Sent from the Apache Spark User List mailing list archive
Hi,
We are trying to adopt Spark for our application.
We have an analytical application which stores data in Star Schemas ( SQL
Server ). All the cubes are loaded into a Key / Value structure and saved in
Trove ( in memory collection ). here key is a short array where each short
number
Thanks for the response!! Will try to see the behaviour with Cache()
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Task-size-variation-while-using-Range-Vs-List-tp18243p18318.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I noticed a behaviour where it was observed that, if i'm using
val temp = sc.parallelize ( 1 to 10)
temp.collect
Task size will be in bytes let's say ( 1120 bytes).
But if i change this to a for loop
import scala.collection.mutable.ArrayBuffer
val data= new ArrayBuffer[Integer]()
for(i -
From what i've observed, there are no debug logs while serialization takes
place. You can see the source code if you want, TaskSetManager class has
some functions for serialization.
--
View this message in context:
Hi,I'm new to spark, and am facing a peculiar problem.
I'm writing a simple Java Driver program where i'm creating Key / Value data
structure and collecting them, once created. The problem i'm facing is that,
when i increase the iterations of a for loop which creates the ArrayList of
Long Values
Hi ,I'm pretty new to Big Data Spark both. I've just started POC work on
spark and me my team are evaluating it with other In Memory computing
tools such as GridGain, Bigmemory, Aerospike some others too, specifically
to solve two sets of problems.1) Data Storage : Our current application
runs
Anybody with good hands on with Spark, please do reply. It would help us a
lot!!
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Concepts-tp16477p16536.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
18 matches
Mail list logo