Re: Network statistics , network cost

2019-03-21 Thread Saikat Kanjilal
How about using this: https://github.com/LucaCanali/sparkMeasure Sent from my iPhone On Mar 21, 2019, at 7:46 AM, asma zgolli mailto:zgollia...@gmail.com>> wrote: Hello , is there a way to get the network statistics, server and distribution statistics from spark? I m looking for that

Re: DataSourceV2 hangouts sync

2018-10-25 Thread Saikat Kanjilal
Ditto, I’d also like to join and am in Seattle, generally afternoons work better for me. Sent from my iPhone On Oct 25, 2018, at 5:02 PM, Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Big +1 on this! I live in UTC+8 and I'm available from 8 am, which is 5 pm in the bay area. Hopefully we

Re: Spark model serving

2018-08-03 Thread Saikat Kanjilal
@holdenK et al ping on next steps. Sent from my iPhone On Jul 12, 2018, at 3:47 PM, Saikat Kanjilal mailto:sxk1...@hotmail.com>> wrote: Thanks maximiliano so much for responding, I didn't want this discussion to disappear in the wilderness of dev emails :), here's what I would like

Re: Spark model serving

2018-07-12 Thread Saikat Kanjilal
, 2018 11:52 AM To: Saikat Kanjilal; Holden Karau Cc: dev Subject: Re: Spark model serving Hi, As I know many of you don't read / are not part of the user list. I'll make a summary of what happened at the summit: We discussed some needs we get in order to start serving our predictions

Re: Spark model serving

2018-07-03 Thread Saikat Kanjilal
Ping, would love to hear back on this. From: Saikat Kanjilal Sent: Tuesday, June 26, 2018 7:27 AM To: dev@spark.apache.org Subject: Spark model serving HoldenK and interested folks, Am just following up on the spark model serving discussions as this is highly

Spark model serving

2018-06-26 Thread Saikat Kanjilal
HoldenK and interested folks, Am just following up on the spark model serving discussions as this is highly relevant to what I’m embarking on at work. Is there a concrete list of next steps or can someone summarize what was discussed at the summit , would love to have a Seattle version of this

Re: Revisiting Online serving of Spark models?

2018-06-01 Thread Saikat Kanjilal
and post notes after and follow up online. As for Seattle, I would be very interested to meet in person lateen and discuss ;) _____ From: Saikat Kanjilal mailto:sxk1...@hotmail.com>> Sent: Tuesday, May 29, 2018 11:46 AM Subject: Re: Revisiting Online servi

Re: Revisiting Online serving of Spark models?

2018-05-29 Thread Saikat Kanjilal
ing Online serving of Spark models? To: Felix Cheung mailto:felixcheun...@hotmail.com>> Cc: Saikat Kanjilal mailto:sxk1...@hotmail.com>>, Maximiliano Felice mailto:maximilianofel...@gmail.com>>, Joseph Bradley mailto:jos...@databricks.com>>, Leif Walsh mailto:leif.wa...@gma

Re: Revisiting Online serving of Spark models?

2018-05-22 Thread Saikat Kanjilal
I’m in the same exact boat as Maximiliano and have use cases as well for model serving and would love to join this discussion. Sent from my iPhone On May 22, 2018, at 6:39 AM, Maximiliano Felice > wrote: Hi! I'm don't usually

Re: Performance Benchmark Hbase vs Cassandra

2017-06-29 Thread Saikat Kanjilal
You should think about using ycsb and write an adapter for spark perf tests against these databases if it doesn't already exist. See here: https://github.com/brianfrankcooper/YCSB Sent from my iPhone On Jun 29, 2017, at 7:33 PM, Raj, Deepu

Re: Spark madness

2017-05-18 Thread Saikat Kanjilal
rom: Saikat Kanjilal <sxk1...@hotmail.com> Sent: Thursday, May 18, 2017 8:18 PM To: dev@spark.apache.org Subject: Spark madness Hi Devs, I'm needing to read a json file from hdfs and turn that into a scala string, I have dug around for documentation on how to do this and found thi

Spark madness

2017-05-18 Thread Saikat Kanjilal
Hi Devs, I'm needing to read a json file from hdfs and turn that into a scala string, I have dug around for documentation on how to do this and found this: http://stackoverflow.com/questions/30445263/how-to-read-whole-file-in-one-string

ML Repo using spark

2017-04-21 Thread Saikat Kanjilal
Folks, I've been building out a large machine learning repository using spark as the compute platform running on yarn and hadoop, I was wondering if folks have some best practice oriented thoughts around unit testing/integration testing this application, I am using spark-submit and a

Re: File JIRAs for all flaky test failures

2017-03-28 Thread Saikat Kanjilal
I'm happy to help out in this effort and will look at that label and see what tests I can look into and/or fix. From: Kay Ousterhout <k...@eecs.berkeley.edu> Sent: Monday, March 27, 2017 9:47 PM To: Reynold Xin Cc: Saikat Kanjilal; Sean Owe

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Saikat Kanjilal
into this. From: Sean Owen <so...@cloudera.com> Sent: Thursday, February 16, 2017 8:45 AM To: Saikat Kanjilal; dev@spark.apache.org Subject: Re: File JIRAs for all flaky test failures I'm not sure what you're specifically suggesting. Of course flaky tests are bad and they should be

Re: File JIRAs for all flaky test failures

2017-02-16 Thread Saikat Kanjilal
: Saikat Kanjilal <sxk1...@hotmail.com> Sent: Wednesday, February 15, 2017 6:12 PM To: Josh Rosen Cc: Armin Braun; Kay Ousterhout; dev@spark.apache.org Subject: Re: File JIRAs for all flaky test failures The issue was not with a lack of tooling, I used the url you are describing below to

Re: Spark Job Performance monitoring approaches

2017-02-16 Thread Saikat Kanjilal
There's also this: https://github.com/databricks/spark-perf [https://avatars2.githubusercontent.com/u/4998052?v=3=400] GitHub - databricks/spark-perf: Performance tests for Spark github.com Sweeps sets of

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
test failed this week (i.e. a test case will appear at most once in the results list). I've also exposed this as an RSS feed at https://spark-tests.appspot.com/rss/failed-tests/new. On Wed, Feb 15, 2017 at 12:51 PM Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>> wrot

Re: File JIRAs for all flaky test failures

2017-02-15 Thread Saikat Kanjilal
I was working on something to address this a while ago https://issues.apache.org/jira/browse/SPARK-9487 but the difficulty in testing locally made things a lot more complicated to fix for each of the unit tests, should we resurface this JIRA again, I would whole heartedly agree with the

Re: spark sql versus interactive hive versus hive

2017-02-11 Thread Saikat Kanjilal
s recommend to generate or use data yourself that fits to the data the company is using. Keep also in mind that time is needed to convert this data in a efficient format. On 10 Feb 2017, at 20:36, Saikat Kanjilal <sxk1...@hotmail.com<mailto:sxk1...@hotmail.com>> wrote: Folks, I'm embarki

spark sql versus interactive hive versus hive

2017-02-10 Thread Saikat Kanjilal
Folks, I'm embarking on a project to build a POC around spark sql, I was wondering if anyone has experience in comparing spark sql with hive or interactive hive and data points around the types of queries suited for both, I am naively assuming that spark sql will beat hive in all queries given

Re: MLlib mission and goals

2017-01-24 Thread Saikat Kanjilal
In reading through this and thinking about usability is there any interest in building a performance measurement framework around some (or maybe all) of the ML/Lib algorithms, I envision this as something that can get run for each release build for our end users, it may be useful for internal

Re: [PYSPARK] Python tests organization

2017-01-14 Thread Saikat Kanjilal
he.org/jira/browse/SPARK-19160 I may have to adjust some tests anyway. On 01/12/2017 07:36 PM, Saikat Kanjilal wrote: Maciej? LGTM, what do you think? I can create a JIRA and drive this. From: Holden Karau <hol...@pigscanfly.ca><mailto:hol...@pigs

Re: [PYSPARK] Python tests organization

2017-01-12 Thread Saikat Kanjilal
Maciej? LGTM, what do you think? I can create a JIRA and drive this. From: Holden Karau <hol...@pigscanfly.ca> Sent: Thursday, January 12, 2017 10:34 AM To: Saikat Kanjilal; dev@spark.apache.org Subject: Re: [PYSPARK] Python tests organization I'd be

Re: [PYSPARK] Python tests organization

2017-01-12 Thread Saikat Kanjilal
Following up, any thoughts on next steps for this? From: Maciej Szymkiewicz <mszymkiew...@gmail.com> Sent: Wednesday, January 11, 2017 10:14 AM To: Saikat Kanjilal Subject: Re: [PYSPARK] Python tests organization Not yet, I want to see if there is any con

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal
Maciej/Reynolds, If its ok with you guys I can start working on a proposal and create a JIRA, let me know next steps. Thanks in advance. From: Maciej Szymkiewicz <mszymkiew...@gmail.com> Sent: Wednesday, January 11, 2017 10:14 AM To: Saikat Kanjilal S

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal
Is it worth to come up with a proposal for this and float to dev? From: Reynold Xin <r...@databricks.com> Sent: Wednesday, January 11, 2017 9:47 AM To: Maciej Szymkiewicz; Saikat Kanjilal; dev@spark.apache.org Subject: Re: [PYSPARK] Python tests organi

Re: [PYSPARK] Python tests organization

2017-01-11 Thread Saikat Kanjilal
Hello Maciej, If there's a jira available for this I'd like to help get this moving, let me know next steps. Thanks in advance. From: Maciej Szymkiewicz Sent: Wednesday, January 11, 2017 4:18 AM To: dev@spark.apache.org Subject:

Re: Spark-9487, Need some insight

2016-12-06 Thread Saikat Kanjilal
:44 PM To: Saikat Kanjilal Cc: dev@spark.apache.org Subject: Re: Spark-9487, Need some insight Honestly it is pretty difficult. Given the difficulty, would it still make sense to do that change? (the one that sets the same number of workers/parallelism across different languages in testing)

Re: Spark-9487, Need some insight

2016-12-05 Thread Saikat Kanjilal
. From: Saikat Kanjilal <sxk1...@hotmail.com> Sent: Tuesday, November 29, 2016 8:14 PM To: dev@spark.apache.org Subject: Spark-9487, Need some insight Hello Spark dev community, I took this the following jira item (https://github.com/apache/spark/pull/15848) and am l

Spark-9487, Need some insight

2016-11-29 Thread Saikat Kanjilal
Hello Spark dev community, I took this the following jira item (https://github.com/apache/spark/pull/15848) and am looking for some general pointers, it seems that I am running into issues where things work successfully doing local development on my macbook pro but fail on jenkins for a