Spark streaming and session windows

2015-08-07 Thread Ankur Chauhan
. Any help would be appreciated. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

[Spark Streaming] Session based windowing like in google dataflow

2015-08-07 Thread Ankur Chauhan
. Any help would be appreciated. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

Re: use S3-Compatible Storage with spark

2015-07-17 Thread Ankur Chauhan
The endpoint is the property you want to set. I would look at the source for that. Sent from my iPhone On Jul 17, 2015, at 08:55, Sujit Pal sujitatgt...@gmail.com wrote: Hi Schmirr, The part after the s3n:// is your bucket name and folder name, ie

RE: Using Spark like a search engine

2015-05-25 Thread ankur chauhan
Hi, I am sure you can use spark for this but it seems like a problem that should be delegated to a text based indexing technology like elastic search or something based on lucene to serve the requests. Spark can be used to prepare the data that can be fed to the indexing service. Using spark

Re: Spark on Mesos vs Yarn

2015-05-15 Thread Ankur Chauhan
to the JIRA and pull requests so that I can keep track on the progress/features. Again, thanks for replying. - -- Ankur Chauhan On 15/05/2015 00:39, Tim Chen wrote: Hi Ankur, This is a great question as I've heard similar concerns about Spark on Mesos. At the time when I started to contribute

Spark on Mesos vs Yarn

2015-05-15 Thread Ankur Chauhan
equally important, but has this changed as spark has now reached almost 1.4.0 stage? - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVVZKGAAoJEOSJAMhvLp3L0vEIAI4edLB2rMGk+OTI4WujxX6k Ud5NyFUpaQ8WDjOhwcWB9RK5EoM7X3wGzRcGza1HLVnvdSUBG8Ltabt47GsP2lo0 7H9y2GluUZg/RJXbN0Ehp6moWjAU1W

Re: kafka + Spark Streaming with checkPointing fails to restart

2015-05-14 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks everyone, that was the problem. the create new streaming context function was supposed to setup the stream processing as well as the checkpoint directory. I had missed the whole process of checkpoint setup. With that done, everything works as

kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread Ankur Chauhan
6b932540ac87577e4ce8385d26699c1a7d05e/spark-console.log Could someone tell me what it causes this problem? I tried looking at the stacktrace but I am not very familiar with the codebase to make solid assertions. Any ideas as to what may be happening here. - --- Ankur Chauhan -BEGIN PGP

Re: how to monitor multi directories in spark streaming task

2015-05-13 Thread Ankur Chauhan
/2015/05/13/data.txt like this. and one new directory one day. how to create the new DStream for tomorrow’s new directory(/user/root/2015/05/13/) ?? 在 2015年5月13日,下午4:59,Ankur Chauhan achau...@brightcove.com 写道: I would suggest creating one DStream per directory and then using

Re: how to monitor multi directories in spark streaming task

2015-05-13 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I would suggest creating one DStream per directory and then using StreamingContext#union(...) to get a union DStream. - -- Ankur On 13/05/2015 00:53, hotdog wrote: I want to use use fileStream in spark streaming to monitor multi hdfs directories,

kafka + Spark Streaming with checkPointing fails to restart

2015-05-13 Thread Ankur Chauhan
6b932540ac87577e4ce8385d26699c1a7d05e/spark-console.log Could someone tell me what it causes this problem? I tried looking at the stacktrace but I am not very familiar with the codebase to make solid assertions. Any ideas as to what may be happening here. - --- Ankur Chauhan -BEGIN PGP

data schema and serialization format suggestions

2015-05-13 Thread Ankur Chauhan
with this problem. At the high level, the requirements are fairly simple: 1. Simple and easy to understand and extend. 2. Usable in places other than spark. ( I would want to use them in other applications and tools ). 3. Ability to play nice with parquet and Kafka (nice to have). - -- Ankur Chauhan

Spark streaming updating a large window more frequently

2015-05-08 Thread Ankur Chauhan
really appreciate pointers to code samples or some blogs that could help me identify best practices. -- Ankur Chauhan signature.asc Description: Message signed with OpenPGP using GPGMail

Re: history server

2015-05-07 Thread Ankur Chauhan
Hi, Sorry this may be a little off topic but I tried searching for docs on history server but couldn't really find much. Can someone point me to a doc or give me a point of reference for the use and intent of a history server? -- Ankur On 7 May 2015, at 12:06, Koert Kuipers

Re: Nightly builds/releases?

2015-05-04 Thread Ankur Chauhan
/SPARK-1517 On May 4, 2015, at 10:25 PM, Ankur Chauhan achau...@brightcove.com wrote: Hi, Does anyone know if spark has any nightly builds or equivalent that provides binaries that have passed a CI build so that one could try out the bleeding edge without having to compile. -- Ankur

Spark + Mesos + HDFS resource split

2015-04-27 Thread Ankur Chauhan
(or EC2 instance or VM). My question is: What is the recommended resource splitting? How much memory and CPU should I preallocate for HDFS and how much should I set aside as allocatable by mesos? In addition, is there some rule-of-thumb recommendation around this? - -- Ankur Chauhan -BEGIN PGP

Re: spark mesos deployment : starting workers based on attributes

2015-04-04 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Created issue: https://issues.apache.org/jira/browse/SPARK-6707 I would really appreciate ideas/views/opinions on this feature. - -- Ankur Chauhan On 03/04/2015 13:23, Tim Chen wrote: Hi Ankur, There isn't a way to do that yet, but it's

Re: spark mesos deployment : starting workers based on attributes

2015-04-03 Thread Ankur Chauhan
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, Thanks! I'll add the JIRA. I'll also try to work on a patch this weekend . - -- Ankur Chauhan On 03/04/2015 13:23, Tim Chen wrote: Hi Ankur, There isn't a way to do that yet, but it's simple to add. Can you create a JIRA in Spark

spark mesos deployment : starting workers based on attributes

2015-04-03 Thread Ankur Chauhan
such a behavior. Thanks! - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVHvMlAAoJEOSJAMhvLp3LaV0H/jtX+KQDyorUESLIKIxFV9KM QjyPtVquwuZYcwLqCfQbo62RgE/LeTjjxzifTzMM5D6cf4ULBH1TcS3Is2EdOhSm UTMfJyvK06VFvYMLiGjqN4sBG3DFdamQif18qUJoKXX/Z9cUQO9SaSjIezSq2gd8

Mesos - spark task constraints

2015-04-02 Thread Ankur Chauhan
(if available) and then prefer other nodes. I realize that the prefer part may not be possible but I atleast want to start with just getting them to run only on the tachyon enabled nodes. Also, if someone could give me a pointer to the mesos scheduler code in spark that'll be great. - -- Ankur Chauhan

deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
hdfs/tachyon cluster. - -- Ankur Chauhan -BEGIN PGP SIGNATURE- iQEcBAEBAgAGBQJVGy4bAAoJEOSJAMhvLp3L5bkH/0MECyZkh3ptWzmsNnSNfGWp Oh93TUfD+foXO2ya9D+hxuyAxbjfXs/68aCWZsUT6qdlBQU9T1vX+CmPOnpY1KPN NJP3af+VK0osaFPo6k28OTql1iTnvb9Nq+WDlohxBC/hZtoYl4cVxu8JmRlou/nb

Re: deployment of spark on mesos and data locality in tachyon/hdfs

2015-03-31 Thread Ankur Chauhan
configuration to talk to s3 or the hdfs datanode) and the mesos slave process. Is this correct? On 31/03/2015 16:43, Haoyuan Li wrote: Tachyon should be co-located with Spark in this case. Best, Haoyuan On Tue, Mar 31, 2015 at 4:30 PM, Ankur Chauhan achau...@brightcove.com mailto:achau