JavaSparkContext: dependency on ui/

2016-06-27 Thread jay vyas
arkContext.(JavaSparkContext.scala:58) -- jay vyas

Spark jobs without a login

2016-06-16 Thread jay vyas
not too worries about this - but it seems like it might be nice if maybe we could specify a user name as part of sparks context or as part of an external parameter rather then having to use the java based user/group extractor. -- jay vyas

Re: Unit Testing

2015-08-13 Thread jay vyas
a producer and a consumer, so that you don't get a starvation scenario. On Wed, Aug 12, 2015 at 7:31 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Is there a way to run spark streaming methods in standalone eclipse environment to test out the functionality? -- jay vyas

Re: Amazon DynamoDB Spark

2015-08-07 Thread Jay Vyas
In general the simplest way is that you can use the Dynamo Java API as is and call it inside a map(), and use the asynchronous put() Dynamo api call . On Aug 7, 2015, at 9:08 AM, Yasemin Kaya godo...@gmail.com wrote: Hi, Is there a way using DynamoDB in spark application? I have to

Re: How to build Spark with my own version of Hadoop?

2015-07-22 Thread jay vyas
, 2015 at 11:11 PM, Dogtail Ray spark.ru...@gmail.com wrote: Hi, I have modified some Hadoop code, and want to build Spark with the modified version of Hadoop. Do I need to change the compilation dependency files? How to then? Great thanks! -- jay vyas

Re: Spark Streaming on top of Cassandra?

2015-05-21 Thread jay vyas
For additional commands, e-mail: user-h...@spark.apache.org -- jay vyas

Re: Re: spark streaming printing no output

2015-04-16 Thread jay vyas
. Please let me know If I am doing something wrong. -- jay vyas

Re: Submitting to a cluster behind a VPN, configuring different IP address

2015-04-02 Thread jay vyas
. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- jay vyas

Re: Single threaded laptop implementation beating a 128 node GraphX cluster on a 1TB data set (128 billion nodes) - What is a use case for GraphX then? when is it worth the cost?

2015-03-30 Thread jay vyas
Just the same as spark was disrupting the hadoop ecosystem by changing the assumption that you can't rely on memory in distributed analytics...now maybe we are challenging the assumption that big data analytics need to distributed? I've been asking the same question lately and seen similarly that

Re: Untangling dependency issues in spark streaming

2015-03-29 Thread jay vyas
(PoolingHttpClientConnectionManager.java:114) -- jay vyas

Re: Apache Ignite vs Apache Spark

2015-02-26 Thread Jay Vyas
-https://wiki.apache.org/incubator/IgniteProposal has I think been updated recently and has a good comparison. - Although grid gain has been around since the spark days, Apache Ignite is quite new and just getting started I think so - you will probably want to reach out to the developers

Re: Spark Streaming and message ordering

2015-02-18 Thread jay vyas
! -- jay vyas

Re: Strongly Typed SQL in Spark

2015-02-11 Thread jay vyas
Ah, nevermind, I just saw http://spark.apache.org/docs/1.2.0/sql-programming-guide.html (language integrated queries) which looks quite similar to what i was thinking about. I'll give that a whirl... On Wed, Feb 11, 2015 at 7:40 PM, jay vyas jayunit100.apa...@gmail.com wrote: Hi spark

Strongly Typed SQL in Spark

2015-02-11 Thread jay vyas
).by(product,meta=product.id=meta.id). toSchemaRDD ? I know the above snippet is totally wacky but, you get the idea :) -- jay vyas

SparkSQL DateTime

2015-02-09 Thread jay vyas
just for dealing with time stamps. Whats the simplest and cleanest way to map non-spark time values into SparkSQL friendly time values? - One option could be a custom SparkSQL type, i guess? - Any plan to have native spark sql support for Joda Time or (yikes) java.util.Calendar ? -- jay vyas

Re: Discourse: A proposed alternative to the Spark User list

2015-01-21 Thread Jay Vyas
Its a very valid idea indeed, but... It's a tricky subject since the entire ASF is run on mailing lists , hence there are so many different but equally sound ways of looking at this idea, which conflict with one another. On Jan 21, 2015, at 7:03 AM, btiernay btier...@hotmail.com wrote: I

Re: Problems with Spark Core 1.2.0 SBT project in IntelliJ

2015-01-13 Thread Jay Vyas
I find importing a working SBT project into IntelliJ is the way to go. How did you load the project into intellij? On Jan 13, 2015, at 4:45 PM, Enno Shioji eshi...@gmail.com wrote: Had the same issue. I can't remember what the issue was but this works: libraryDependencies ++= {

Re: Spark Streaming Threading Model

2014-12-19 Thread jay vyas
it just process them? Asim -- jay vyas

Re: Unit testing and Spark Streaming

2014-12-12 Thread Jay Vyas
https://github.com/jayunit100/SparkStreamingCassandraDemo On this note, I've built a framework which is mostly pure so that functional unit tests can be run composing mock data for Twitter statuses, with just regular junit... That might be relevant also. I think at some point we should come

Re: Spark-Streaming: output to cassandra

2014-12-05 Thread Jay Vyas
Here's an example of a Cassandra etl that you can follow which should exit on its own. I'm using it as a blueprint for revolving spark streaming apps on top of. For me, I kill the streaming app w system.exit after a sufficient amount of data is collected. That seems to work for most any

Re: How to execute a custom python library on spark

2014-11-25 Thread jay vyas
if one can point an example library and how to run it :) Thanks -- jay vyas

Re: Code works in Spark-Shell but Fails inside IntelliJ

2014-11-20 Thread Jay Vyas
This seems pretty standard: your IntelliJ classpath isn't matched to the correct ones that are used in spark shell Are you using the SBT plugin? If not how are you putting deps into IntelliJ? On Nov 20, 2014, at 7:35 PM, Sanjay Subramanian sanjaysubraman...@yahoo.com.INVALID wrote:

Re: Does Spark Streaming calculate during a batch?

2014-11-13 Thread jay vyas
that only after all the batch data was in? Thanks -- jay vyas

Re: Streaming: getting total count over all windows

2014-11-13 Thread jay vyas
...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- jay vyas

Re: Spark streaming cannot receive any message from Kafka

2014-11-12 Thread Jay Vyas
Yup , very important that n1 for spark streaming jobs, If local use local[2] The thing to remember is that your spark receiver will take a thread to itself and produce data , so u need another thread to consume it . In a cluster manager like yarn or mesos, the word thread Is not used

Re: random shuffle streaming RDDs?

2014-11-03 Thread Jay Vyas
A use case would be helpful? Batches of RDDs from Streams are going to have temporal ordering in terms of when they are processed in a typical application ... , but maybe you could shuffle the way batch iterations work On Nov 3, 2014, at 11:59 AM, Josh J joshjd...@gmail.com wrote: When

Re: Transforming the Dstream vs transforming each RDDs in the Dstream.

2014-10-29 Thread jay vyas
of filtering the data collection times the #buckets? thanks, Gerard. -- jay vyas

Re: real-time streaming

2014-10-28 Thread jay vyas
-mail: user-h...@spark.apache.org -- jay vyas

Re: Streams: How do RDDs get Aggregated?

2014-10-21 Thread jay vyas
Hi Spark ! I found out why my RDD's werent coming through in my spark stream. It turns out you need the onStart() needs to return , it seems - i.e. you need to launch the worker part of your start process in a thread. For example def onStartMock():Unit ={ val future = new Thread(new

Re: Streams: How do RDDs get Aggregated?

2014-10-21 Thread jay vyas
, Oct 21, 2014 at 11:02 AM, jay vyas jayunit100.apa...@gmail.com wrote: Hi Spark ! I found out why my RDD's werent coming through in my spark stream. It turns out you need the onStart() needs to return , it seems - i.e. you need to launch the worker part of your start process in a thread

Re: How do you write a JavaRDD into a single file

2014-10-20 Thread jay vyas
be very useful. -- Steven M. Lewis PhD 4221 105th Ave NE Kirkland, WA 98033 206-384-1340 (cell) Skype lordjoe_com -- jay vyas

Streams: How do RDDs get Aggregated?

2014-10-11 Thread jay vyas
Hi spark ! I dont quite yet understand the semantics of RDDs in a streaming context very well yet. Are there any examples of how to implement CustomInputDStreams, with corresponding Receivers in the docs ? Ive hacked together a custom stream, which is being opened and is consuming data

Re: Does Ipython notebook work with spark? trivial example does not work. Re: bug with IPython notebook?

2014-10-10 Thread jay vyas
=pyStreamingSparkRDDPipe”) data = [1, 2, 3, 4, 5] rdd = sc.parallelize(data) def echo(data): print python recieved: %s % (data) # output winds up in the shell console in my cluster (ie. The machine I launched pyspark from) rdd.foreach(echo) print we are done -- jay vyas

Re: Spark inside Eclipse

2014-10-03 Thread jay vyas
AVENUE, 11TH FLOOR, NEW YORK, NY 10001 E: daniel.siegm...@velos.io W: www.velos.io -- jay vyas

Re: Unit Testing (JUnit) with Spark

2014-07-29 Thread jay vyas
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unit-Testing-JUnit-with-Spark-tp10861.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -- jay vyas

Spark over graphviz (SPARK-1015, SPARK-975)

2014-07-22 Thread jay vyas
is for this , possibly I could lend a hand if there are any loose ends needing to be tied. -- jay vyas

RDD.pipe(...)

2014-07-20 Thread jay vyas
the standard out from the process as its output (i assume that is the most common implementation)? Incidentally, I have not been able to use the pipe command to run an external process yet, so any hints on that would be appreciated. -- jay vyas

Re: RDD.pipe(...)

2014-07-20 Thread jay vyas
, this is essentially an implementation of something analgous to hadoop's streaming api. On Sun, Jul 20, 2014 at 4:09 PM, jay vyas jayunit100.apa...@gmail.com wrote: According to the api docs for the pipe operator, def pipe(command: String): RDD http://spark.apache.org/docs/1.0.0/api/scala/org/apache

Re: Error with spark-submit (formatting corrected)

2014-07-17 Thread Jay Vyas
I think I know what is happening to you. I've looked some into this just this week, and so its fresh in my brain :) hope this helps. When no workers are known to the master, iirc, you get this message. I think this is how it works. 1) You start your master 2) You start a slave, and give it

Re: SPARK_WORKER_PORT (standalone cluster)

2014-07-16 Thread jay vyas
the slaves can be ephemeral. Since the master is fixed, though, a new slave can reconnect at any time. On Mon, Jul 14, 2014 at 10:01 PM, jay vyas jayunit100.apa...@gmail.com wrote: Hi spark ! What is the purpose of the randomly assigned SPARK_WORKER_PORT from the documentation it sais

SPARK_WORKER_PORT (standalone cluster)

2014-07-14 Thread jay vyas
please just point me to the right documentation if im mising something obvious :) thanks ! -- jay vyas