Re: Potability of dockers built on different cloud platforms

2023-04-05 Thread Ken Peng
ashok34...@yahoo.com.INVALID wrote: Is it possible to use Spark docker built on GCP on AWS without rebuilding from new on AWS? I am using the spark image from bitnami for running on k8s. And yes, it's deployed by helm. -- https://kenpeng.pages.dev/

Re: What do I loose if I run spark without using HDFS or Zookeeper?

2016-08-26 Thread Carlile, Ken
We use Spark with NFS as the data store, mainly using Dr. Jeremy Freeman’s Thunder framework. Works very well (and I see HUGE throughput on the storage system during loads). I haven’t seen (or heard from the devs/users) a need for HDFS or S3. —Ken On Aug 25, 2016, at 8:02 PM

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
Hi Deepak,  Yes, that’s about the size of it. The spark job isn’t filling the disk by any stretch of the imagination; in fact the only stuff that’s writing to the disk from Spark in certain of these instances is the logging.  Thanks, —Ken On Jun 16, 2016, at 12:17 PM

Re: Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
attempted to run with 15 cores out of 16 and 25GB of RAM out of 128. He still lost nodes.  4. He’s currently running storage benchmarking tests, which consist mainly of shuffles.  Thanks! Ken On Jun 16, 2016, at 8:00 AM, Deepak Goel <deic...@gmail.com> wrote: I am no expert, but some

Spark crashes worker nodes with multiple application starts

2016-06-16 Thread Carlile, Ken
anyone seen anything like this? Any ideas where to look next? Thanks, Ken - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
accordingly.  Thanks! —Ken On Apr 3, 2016, at 11:06 AM, Yong Zhang <java8...@hotmail.com> wrote: In the standalone mode, it applies to the Driver JVM processor heap size. You should consider giving enough memory space to it, in standalone mode, due to: 1) Any data you brin

spark.driver.memory meaning

2016-04-03 Thread Carlile, Ken
both 256GB nodes and 128GB nodes available for use as the driver) Thanks, Ken - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken
cluster work as far as failed workers.  Thanks again,  —Ken On Mar 26, 2016, at 4:08 PM, Sven Krasser <kras...@gmail.com> wrote: My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken
threads?  Thanks! Ken On Mar 25, 2016, at 9:10 PM, Sven Krasser <kras...@gmail.com> wrote: Hey Ken, I also frequently see more pyspark daemons than configured concurrency, often it's a low multiple. (There was an issue pre-1.3.0 that caused this to be quite a bit

Re: Limit pyspark.daemon threads

2016-03-25 Thread Carlile, Ken
own, driving the load up. I’m hoping someone has seen something like this.  —Ken On Mar 21, 2016, at 3:07 PM, Carlile, Ken <carli...@janelia.hhmi.org> wrote: No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a littl

Re: Limit pyspark.daemon threads

2016-03-21 Thread Carlile, Ken
No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a little more sense (at least it’s a multiple of 16), and it seems to be happening at reduce and collect portions of the code.  —Ken On Mar 17, 2016, at 10:51 AM, Carlile

Re: Limit pyspark.daemon threads

2016-03-18 Thread Carlile, Ken
.  —Ken On Mar 17, 2016, at 10:50 AM, Ted Yu <yuzhih...@gmail.com> wrote: I took a look at docs/configuration.md Though I didn't find answer for your first question, I think the following pertains to your second question:   spark.python.worker.memory

Limit pyspark.daemon threads

2016-03-18 Thread Carlile, Ken
30GB to play with, assuming there is no overhead outside the JVM’s 90GB heap (ha ha.) Thanks, Ken Carlile Sr. Unix Engineer HHMI/Janelia Research Campus 571-209-4363

merge join already sorted data?

2016-02-25 Thread Ken Geis
shuffling via a merge join? I know that Flink supports this, but its JDBC support is pretty lacking in general. Thanks, Ken

building spark 1.6.0 fails

2016-01-28 Thread Carlile, Ken
java version as 1.8, but I just got the same error with invalid source release: 1.8 instead of 1.7. My java -version and javac -version are reporting as 1.8.0.45, and I have the JAVA_HOME env set. Anyone have any ideas? Incidentally, building 2.0.0 from source worked fine… Thanks, Ken

Re: Applicaiton Detail UI change

2015-12-21 Thread Carlile, Ken
today: https://unscrupulousmodifier.wordpress.com/2015/07/20/running-spark-as-a-job-on-a-grid-engine-hpc-cluster-part-1 —Ken > On Dec 21, 2015, at 4:00 PM, MegaLearn <j...@megalearningllc.com> wrote: > > How do you start the Spark daemon, directly? > https://issues.apache.org/jira/

Re: LDA prediction on new document

2015-05-22 Thread Ken Geis
Dani, this appears to be addressed in SPARK-5567, scheduled for Spark 1.5.0. Ken On May 21, 2015, at 11:12 PM, user-digest-h...@spark.apache.org wrote: From: Dani Qiu zongmin@gmail.com Subject: LDA prediction on new document Date: May 21, 2015 at 8:48:40 PM PDT To: user

Re: JAVA_HOME problem with upgrade to 1.3.0

2015-03-23 Thread Williams, Ken
From: Williams, Ken Williams ken.willi...@windlogics.commailto:ken.willi...@windlogics.com Date: Thursday, March 19, 2015 at 10:59 AM To: Spark list user@spark.apache.orgmailto:user@spark.apache.org Subject: JAVA_HOME problem with upgrade to 1.3.0 […] Finally, I go and check the YARN

JAVA_HOME problem with upgrade to 1.3.0

2015-03-19 Thread Williams, Ken
Log Length: 0 I’m not sure how to interpret that – is '{{JAVA_HOME}}' a literal (including the brackets) that’s somehow making it into a script? Is this coming from the worker nodes or the driver? Anything I can do to experiment troubleshoot? -Ken

Re: JAVA_HOME problem with upgrade to 1.3.0

2015-03-19 Thread Williams, Ken
JAVA_HOME=/usr/jdk64/jdk1.6.0_31 -Ken CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution of any kind

Data Source for Spark SQL

2014-11-25 Thread ken
I am using Spark SQL from Hive table with Parquet SerDe. Most queries are executed from Spark's JDBC Thrift server. Is there more efficient way to access/query data? For example, using saveAsParquetFile() and parquetFile() to save/load Parquet data and run queries directly? Thanks, Ken -- View

Re: Spark Worker node accessing Hive metastore

2014-10-29 Thread ken
Thanks Akhil. So the worker spark node doesn't need access to metastore to run Hive queries? If yes, which component accesses the metastore? For Hive, the Hive-cli accesses the metastore before submitting M/R jobs. Thanks, Ken -- View this message in context: http://apache-spark-user-list

Spark Worker node accessing Hive metastore

2014-10-24 Thread ken
Does a Spark worker node need access to Hive's metastore if part of a job contains Hive queries? Thanks, Ken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Worker-node-accessing-Hive-metastore-tp17255.html Sent from the Apache Spark User List

cache table with JDBC

2014-08-22 Thread ken
I am using Spark's Thrift server to connect to Hive and use JDBC to issue queries. Is there a way to cache table in Sparck by using JDBC call? Thanks, Ken -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/cache-table-with-JDBC-tp12675.html Sent from

Is Spark SQL Thrift Server part of the 1.0.2 release

2014-08-20 Thread Tam, Ken K
Is Spark SQL Thrift Server part of the 1.0.2 release? If not, which release is the target? Thanks, Ken

RE: Is Spark SQL Thrift Server part of the 1.0.2 release

2014-08-20 Thread Tam, Ken K
What is the best way to run Hive queries in 1.0.2? In my case. Hive queries will be invoked from a middle tier webapp. I am thinking to use the Hive JDBC driver. Thanks, Ken From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, August 20, 2014 9:38 AM To: Tam, Ken K Cc: user

Build times for Spark

2014-04-25 Thread Williams, Ken
of CPU time). After that, I did 'SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly' and that took 25 minutes wallclock, 73 minutes CPU. Is that typical? Or does that indicate some setup problem in my environment? -- Ken Williams, Senior Research Scientist WindLogics http://windlogics.com

Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
way to get more information about where in the process it's failing? Thanks. -- Ken Williams, Senior Research Scientist WindLogics http://windlogics.com CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended recipient(s) and may

RE: Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
the Hadoop command-line tools do, but that's not so important. -Ken -Original Message- From: Williams, Ken [mailto:ken.willi...@windlogics.com] Sent: Monday, April 21, 2014 2:04 PM To: Spark list Subject: Problem connecting to HDFS in Spark shell I'm trying to get my feet wet with Spark

RE: Problem connecting to HDFS in Spark shell

2014-04-21 Thread Williams, Ken
-Original Message- From: Marcelo Vanzin [mailto:van...@cloudera.com] Hi Ken, On Mon, Apr 21, 2014 at 1:39 PM, Williams, Ken ken.willi...@windlogics.com wrote: I haven't figured out how to let the hostname default to the host mentioned in our /etc/hadoop/conf/hdfs-site.xml like

/bin/java not found: JAVA_HOME ignored launching shark executor

2014-04-10 Thread Ken Ellinwood
14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added: app-20140410080041-0017/9 on worker-20140409145028-ken- VirtualBox-39159 (ken-VirtualBox:39159) with 4 cores 14/04/10 08:00:42 INFO SparkDeploySchedulerBackend: Granted executor ID app-20140410080041-0017/9 on hostPort ken

Re: /bin/java not found: JAVA_HOME ignored launching shark executor

2014-04-10 Thread Ken Ellinwood
Sorry, I forgot to mention this is spark-0.9.1 and shark-0.9.1. Ken On Thursday, April 10, 2014 9:02 AM, Ken Ellinwood kellinw...@yahoo.com wrote: 14/04/10 08:00:42 INFO AppClient$ClientActor: Executor added: app-20140410080041-0017/9 on worker-20140409145028-ken- VirtualBox-39159 (ken