Re: Introducing English SDK for Apache Spark - Seeking Your Feedback and Contributions

2023-07-03 Thread Gavin Ray
Wow, really neat -- thanks for sharing! On Mon, Jul 3, 2023 at 8:12 PM Gengliang Wang wrote: > Dear Apache Spark community, > > We are delighted to announce the launch of a groundbreaking tool that aims > to make Apache Spark more user-friendly and accessible - the English SDK >

Re: Complexity with the data

2022-05-25 Thread Gavin Ray
Forgot to reply-all last message, whoops. Not very good at email. You need to normalize the CSV with a parser that can escape commas inside of strings Not sure if Spark has an option for this? On Wed, May 25, 2022 at 4:37 PM Sid wrote: > Thank you so much for your time. > > I have data like

Re: [Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-18 Thread Gavin Ray
ot;, "true") > .config("spark.sql.cbo.joinReorder.enabled", "true") > .config("spark.sql.cbo.planStats.enabled", "true") > .config("spark.sql.cbo.starSchemaDetection", "true") If you're running on more recent JDK's, you'l

[SQL] Why does a small two-source JDBC query take ~150-200ms with all optimizations (AQE, CBO, pushdown, Kryo, unsafe) enabled? (v3.4.0-SNAPSHOT)

2022-05-18 Thread Gavin Ray
I did some basic testing of multi-source queries with the most recent Spark: https://github.com/GavinRay97/spark-playground/blob/44a756acaee676a9b0c128466e4ab231a7df8d46/src/main/scala/Application.scala#L46-L115 The output of "spark.time()" surprised me: SELECT p.id, p.name, t.id, t.title FROM

[Spark SQL]: Configuring/Using Spark + Catalyst optimally for read-heavy transactional workloads in JDBC sources?

2022-05-16 Thread Gavin Ray
Hi all, I've not got much experience with Spark, but have been reading the Catalyst and Datasources V2 code/tests to try to get a basic understanding. I'm interested in trying Catalyst's query planner + optimizer for queries spanning one-or-more JDBC sources. Somewhat unusually, I'd like to do

unsubscribe

2022-05-02 Thread Ray Qiu

Re: No SparkR on Mesos?

2016-09-08 Thread ray
Hi, Rodrick, Interesting. SparkR is expected not to work with Mesos due to lack of support for mesos in some places, and it has not been tested yet. Have you modified Spark source code by yourself? Have you deployed Spark binary distribution on all salve nodes, and set

How to build Spark with my own version of Hadoop?

2015-07-21 Thread Dogtail Ray
Hi, I have modified some Hadoop code, and want to build Spark with the modified version of Hadoop. Do I need to change the compilation dependency files? How to then? Great thanks!

Re: init / shutdown for complex map job?

2014-12-28 Thread Ray Melton
A follow-up to the blog cited below was hinted at, per But Wait, There's More ... To keep this post brief, the remainder will be left to a follow-up post. Is this follow-up pending? Is it sort of pending? Did the follow-up happen, but I just couldn't find it on the web? Regards, Ray. On Sun

Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-15 Thread Ray
there for almost 1 hour. I guess I can only go with random initialization in KMeans. Thanks again for your help. Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16530.html Sent from the Apache Spark User List

Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Ray
=1 spark.storage.blockManagerHeartBeatMs=3 --driver-memory 2g --executor-memory 2g --num-executors 100 I am running spark-submit on YARN. The Spark version is 1.1.0, and Hadoop is 2.4.1. Could you please some comments/insights? Thanks a lot. Ray -- View this message in context

Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Ray
observable hanging. Hopefully this provides more information. Thanks. Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16417.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Ray
see it got 201 executors (as shown below). http://apache-spark-user-list.1001560.n3.nabble.com/file/n16428/spark_core.png http://apache-spark-user-list.1001560.n3.nabble.com/file/n16428/spark_executor.png Thanks. Ray -- View this message in context: http://apache-spark-user-list

Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Ray
be an active stage with an incomplete progress bar in the UI. Am I wrong? Thanks, Burak! Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16438.html Sent from the Apache Spark User List mailing

Re: Spark KMeans hangs at reduceByKey / collectAsMap

2014-10-14 Thread Ray
, it just finished quickly~~ In your test on mnis8m, did you use KMeans++ as initialization mode? How long it takes? Thanks again for your help. Ray -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-KMeans-hangs-at-reduceByKey-collectAsMap-tp16413p16450

Spark cluster spanning multiple data centers

2014-07-23 Thread Ray Qiu
anyone tried this? Thanks, Ray