Spark 3.3.0 with Structure Streaming from Kafka Issue on commons-pools2

2022-08-26 Thread Raymond Tang
leases to avoid issues like this? * Will this workaround cause side effects based on your knowledge? I’m a frequent user of Spark but I don’t have much detailed knowledge in Spark underlying code (and I only looked into it whenever I need to debug a complex problem). Thanks and Regards, Raymond

Spark SQL Predict Pushdown for Hive Bucketed Table

2022-08-26 Thread Raymond Tang
f mail. Regards, Raymond Hi team, I was testing out Hive bucket table features. One of the benefits as most documentation suggested is that bucketed hive table can be used for query filer/predict pushdown to improve query performance. However through my exploration, that doesn't seem to be true. C

Re: Spark Event Log Forwarding and Offset Tracking

2021-02-04 Thread Raymond Tan
r Spark streaming jobs, is there any way to identify that data from Kafka >> is not consumed for whatever reason, or the offsets are not progressing as >> expected and also forward that to ElasticSearch via log4j for monitoring >> Thanks, Raymond >> -- >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> >

Re: Unable to run simple spark-sql

2019-06-21 Thread Raymond Honderdors
Good to hear It was what I thought Hard to validate with out the actual configuration (Did not have time to setup ambari) On Fri, Jun 21, 2019, 15:44 Nirmal Kumar wrote: > Hey Raymond, > > This root cause of the problem was the hive database location was > 'file:/home/hive/spa

Re: Unable to run simple spark-sql

2019-06-18 Thread Raymond Honderdors
xyz trying creating > directory in /home/hive directory > > Do i need some impersonation setting? > > Thanks, > Nirmal > > Get Outlook for Android<https://aka.ms/ghei36> > > > From: Nirmal Kumar > Sent: Tuesday, June 18, 2

Re: Unable to run simple spark-sql

2019-06-18 Thread Raymond Honderdors
Hi Can you check the permission of the user running spark On the hdfs folder where it tries to create the table On Tue, Jun 18, 2019, 15:05 Nirmal Kumar wrote: > Hi List, > > I tried running the following sample Java code using Spark2 version 2.0.0 > on YARN (HDP-2.5.0.0) > > public class

Anomaly when dealing with Unix timestamp

2018-06-19 Thread Raymond Xie
--++--+--+--+* *only showing top 20 rows* *All ts2 are supposed to show date after 20171125 while there seems to be at least one anomaly showing 20171124* *Any thought?* *Sincerely yours,* *Raymond*

Re: Best way to process this dataset

2018-06-19 Thread Raymond Xie
Thank you, that works. ** *Sincerely yours,* *Raymond* On Tue, Jun 19, 2018 at 4:36 PM, Nicolas Paris wrote: > Hi Raymond > > Spark works well on single machine too, since it benefits from multiple > core. > The csv

Re: Best way to process this dataset

2018-06-19 Thread Raymond Xie
tu so the env differs to what I said in the original email)* *Dataset is 3.6GB* *Thank you very much.* *----* *Sincerely yours,* *Raymond* On Tue, Jun 19, 2018 at 4:04 AM, Matteo Cossu wrote: > Single machine? Any other framework will perform bett

Best way to process this dataset

2018-06-18 Thread Raymond Xie
would like to hear any suggestion from you on how should I process the dataset with my current environment. Thank you. ** *Sincerely yours,* *Raymond*

Re: how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie
) ** *Sincerely yours,* *Raymond* On Sun, Jun 17, 2018 at 2:36 PM, Subhash Sriram wrote: > Hi Raymond, > > If you set your master to local[*] instead of yarn-client, it should run > on your local machine. > > Thanks, > Subhash > > Sent from my iPhon

how can I run spark job in my environment which is a single Ubuntu host with no hadoop installed

2018-06-17 Thread Raymond Xie
0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) Start the infinite loop here ** *Sincerely yours,* *Raymond*

Re: Error: Could not find or load main class org.apache.spark.launcher.Main

2018-06-17 Thread Raymond Xie
:/home/rxie/Downloads/spark/bin:/usr/bin/java rxie@ubuntu:~/Downloads/spark$ spark-shell Error: Could not find or load main class org.apache.spark.launcher.Main ** *Sincerely yours,* *Raymond* On Sun, Jun 17, 2018 at 8:44 AM, Vamshi Talla wrote

Error: Could not find or load main class org.apache.spark.launcher.Main

2018-06-17 Thread Raymond Xie
not find or load main class org.apache.spark.launcher.Main rxie@ubuntu:~/Downloads/spark$ pwd /home/rxie/Downloads/spark rxie@ubuntu:~/Downloads/spark$ ls bin conf data examples jars kubernetes licenses R yarn ** *Sincerely yours,* *Raymond*

spark-shell doesn't start

2018-06-17 Thread Raymond Xie
. ** *Sincerely yours,* *Raymond*

spark-submit Error: Cannot load main class from JAR file

2018-06-17 Thread Raymond Xie
rely yours,* *Raymond*

Re: Not able to sort out environment settings to start spark from windows

2018-06-16 Thread Raymond Xie
Thank you. But there is no special char or space, I actually copied it from Program Files to the root to ensure no space in the path. ** *Sincerely yours,* *Raymond* On Sat, Jun 16, 2018 at 3:42 PM, vaquar khan wrote: > Plz check ur Java H

Not able to sort out environment settings to start spark from windows

2018-06-16 Thread Raymond Xie
JAVA_HOME: C:\jdk1.8.0_151\bin JDK_HOME: C:\jdk1.8.0_151 I also copied all C:\jdk1.8.0_151 to C:\Java\jdk1.8.0_151, and received the same error. Any help is greatly appreciated. Thanks. ** *Sincerely yours,* *Raymond*

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
Thank you very much Marco, I am a beginner in this area, is it possible for you to show me what you think the right script should be to get it executed in terminal? ** *Sincerely yours,* *Raymond* On Sat, Feb 25, 2017 at 6:00 PM, Marco Mistroni

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
\ examples/src/main/python/streaming/kafka_wordcount.py \ localhost:2181 test` """ Can anyone give any thought on how to find out? Thank you very much in advance. ** *Sincerely yours,* *Raymond* On Sat, Feb 25, 2017 at 5:27 PM,

Re: No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
Thank you, it is still not working: [image: Inline image 1] By the way, here is the original source: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/kafka_wordcount.py ** *Sincerely yours,* *Raymond* On Sat, Feb

No main class set in JAR; please specify one with --class and java.lang.ClassNotFoundException

2017-02-25 Thread Raymond Xie
/hdp/2.5.0.0-1245/kafka/libs/kafka-streams-0.10.0.2.5.0.0-1245.jar ** *Sincerely yours,* *Raymond*

Re: How to connect Tableau to databricks spark?

2017-01-09 Thread Raymond Xie
yours,* *Raymond* On Mon, Jan 9, 2017 at 4:53 PM, Silvio Fiorito < silvio.fior...@granturing.com> wrote: > Also, meant to add the link to the docs: https://docs.databricks.com/ > user-guide/faq/tableau.html > > > > > > *From: *Silvio Fiorito <silvio.fior...

How to connect Tableau to databricks spark?

2017-01-08 Thread Raymond Xie
s account in Tableau to connect it. Thank you very much. Any clue is appreciated. ** *Sincerely yours,* *Raymond*

subsription

2017-01-08 Thread Raymond Xie
** *Sincerely yours,* *Raymond*

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
Thank you very much Marco, is your code in Scala? do you have a python example? Can anyone give me a python example to handle json data on Spark? ** *Sincerely yours,* *Raymond* On Sun, Jan 1, 2017 at 12:29 PM, Marco Mistroni <mmi

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
oblem here, the json data needs to be sort of treaked before it can be really used, simply using df = sqlContext.read.json("/json/") just makes the df messy, I need the df know the fields in the json file. How? Thank you. *----* *Sincerely yo

Re: Error when loading json to spark

2017-01-01 Thread Raymond Xie
n(GatewayConnection.java:209) at java.lang.Thread.run(Thread.java:745) >>> *----* *Sincerely yours,* *Raymond* On Sat, Dec 31, 2016 at 11:52 PM, Miguel Morales <therevolti...@gmail.com> wrote: > Looks like it's trying to treat that path as a folder, try omi

From Hive to Spark, what is the default database/table

2016-12-31 Thread Raymond Xie
pt: pyspark.sql.utils.AnalysisException: u'Table not found: flight201601;' How do I write the sql query if I want to select from flight201601? Thank you. *----* *Sincerely yours,* *Raymond*

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-31 Thread Raymond Xie
led on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused any thought? ** *Sincerely yours,* *Raymond* On Fri, Dec 30, 2016 at 10:08 PM, Felix Cheung <feli

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
kage/databricks/spark-csv > > > ------ > *From:* Raymond Xie <xie3208...@gmail.com> > *Sent:* Friday, December 30, 2016 6:46:11 PM > *To:* user@spark.apache.org > *Subject:* How to load a big csv to dataframe in Spark 1.6 > > Hello, > > I see there is usually thi

Re: How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
yes, I believe there should be a better way to handle my case. ~~~sent from my cell phone, sorry if there is any typo 2016年12月30日 下午10:09,"write2sivakumar@gmail" <write2sivaku...@gmail.com>写道: Hi Raymond, Your problem is to pass those 100 fields to .toDF() method?? Sent

How to load a big csv to dataframe in Spark 1.6

2016-12-30 Thread Raymond Xie
how() However in my case my csv has 100+ fields, which means toDF() will be very lengthy. Can anyone tell me a practical method to load the data? Thank you very much. *Raymond*

What's the best practice to load data from RDMS to Spark

2016-12-30 Thread Raymond Xie
Hello, I am new to Spark, as a SQL developer, I only took some courses online and spent some time myself, never had a chance working on a real project. I wonder what would be the best practice (tool, procedure...) to load data (csv, excel) into Spark platform? Thank you. *Raymond*

Re: Does SparkSql thrift server support insert/update/delete sql statement

2016-03-28 Thread Raymond Honderdors
It should Depensing on the storage used I am facing a simular issue running spark on emr I got emr login errors for insert Sent from Outlook Mobile On Mon, Mar 28, 2016 at 10:31 PM -0700, "sage" > wrote: Does SparkSql

RE: Does SparkSql has official jdbc/odbc driver?

2016-03-27 Thread Raymond Honderdors
For now they are for free Sent from my Samsung Galaxy smartphone. Original message From: Sage Meng <lkke...@gmail.com> Date: 3/28/2016 04:14 (GMT+02:00) To: Raymond Honderdors <raymond.honderd...@sizmek.com> Cc: mich.talebza...@gmail.com, user@spark.apache.org

Re: Does SparkSql has official jdbc/odbc driver?

2016-03-25 Thread Raymond Honderdors
Recommended drivers for spark / thrift are the once from databricks (simba) My experiance is that the databricks driver works perfect on windows and linux On windows you can get the microsoft driver Both are odbc Not jet tried the jdbc drivers Sent from Outlook Mobile

Re: Spark with Druid

2016-03-23 Thread Raymond Honderdors
linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani which references https://github.com/SparklineData/spark-druid-olap On Wed, Mar 23, 2016 at 5:59 AM, Raymond Honderdors <raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com>> wrote:

Spark with Druid

2016-03-23 Thread Raymond Honderdors
Does anyone have a good overview on how to integrate Spark and Druid? I am now struggling with the creation of a druid data source in spark. Raymond Honderdors Team Lead Analytics BI Business Intelligence Developer raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com

SparkSQL 2.0 snapshot - thrift server behavior

2016-03-21 Thread Raymond Honderdors
t back for the "SHOW TABLES IN 'default'"? Raymond Honderdors Team Lead Analytics BI Business Intelligence Developer raymond.honderd...@sizmek.com<mailto:raymond.honderd...@sizmek.com> T +972.7325.3569 Herzliya [Read More]<http://feeds.feedburner.com/~r/sizmek-blog/~6/1>

Re: Why mapred for the HadoopRDD?

2014-11-04 Thread raymond
You could take a look at sc.newAPIHadoopRDD() 在 2014年11月5日,上午9:29,Corey Nolet cjno...@gmail.com 写道: I'm fairly new to spark and I'm trying to kick the tires with a few InputFormats. I noticed the sc.hadoopRDD() method takes a mapred JobConf instead of a MapReduce Job object. Is there

SparkSQL , best way to divide data into partitions?

2014-10-22 Thread raymond
done by load data into a dedicated partition directly. but if I don’t want to select data out by a specific partition then insert it with each partition field value. How should I do it in a quick way? And how to do it in Spark sql? raymond

Re: All executors run on just a few nodes

2014-10-20 Thread raymond
happened in your cluster. But seems to me likely to be. Your case might not be that executors not run on all spark nodes, but they don’t get registered quick enough. On 2014年10月20日, at 下午2:15, Tao Xiao xiaotao.cs@gmail.com wrote: Raymond, Thank you. But I read from other thread

RE: RDDs

2014-09-04 Thread Liu, Raymond
on the same RDD, it doesn't matter that the RDD is replicated or not. You can always do it if you wish to. Best Regards, Raymond Liu -Original Message- From: Kartheek.R [mailto:kartheek.m...@gmail.com] Sent: Thursday, September 04, 2014 1:24 PM To: u...@spark.incubator.apache.org Subject

RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
limit. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:27 PM To: Patrick Wendell Cc: user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD But is it possible to make t resizable? When we don't have many RDD to cache

RE: memory size for caching RDD

2014-09-04 Thread Liu, Raymond
Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 2:57 PM To: Liu, Raymond Cc: Patrick Wendell; user@spark.apache.org; d...@spark.apache.org Subject: Re: memory size for caching RDD Oh I see. I want to implement something like this: sometimes I need

RE: RDDs

2014-09-03 Thread Liu, Raymond
. For Spark, Application is your user program. And a job is an internal schedule conception, It's a group of some RDD operation. Your applications might invoke several jobs. Best Regards, Raymond Liu From: rapelly kartheek [mailto:kartheek.m...@gmail.com] Sent: Wednesday, September 03, 2014 5

RE: resize memory size for caching RDD

2014-09-03 Thread Liu, Raymond
AFAIK, No. Best Regards, Raymond Liu From: 牛兆捷 [mailto:nzjem...@gmail.com] Sent: Thursday, September 04, 2014 11:30 AM To: user@spark.apache.org Subject: resize memory size for caching RDD Dear all: Spark uses memory to cache RDD and the memory size is specified

RE: how to filter value in spark

2014-08-31 Thread Liu, Raymond
You could use cogroup to combine RDDs in one RDD for cross reference processing. e.g. a.cogroup(b). filter{case (_, (l,r)) = l.nonEmpty r.nonEmpty }. map{case (k,(l,r)) = (k, l)} Best Regards, Raymond Liu -Original Message- From: marylucy [mailto:qaz163wsx_...@hotmail.com] Sent

RE: What is a Block Manager?

2014-08-27 Thread Liu, Raymond
, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Wednesday, August 27, 2014 1:40 PM To: Liu, Raymond Cc: user@spark.apache.org Subject: Re: What is a Block Manager? We're a single-app deployment so we want to launch as many executors as the system has workers. We accomplish

RE: What is a Block Manager?

2014-08-26 Thread Liu, Raymond
). Best Regards, Raymond Liu From: Victor Tso-Guillen [mailto:v...@paxata.com] Sent: Tuesday, August 26, 2014 11:42 PM To: user@spark.apache.org Subject: What is a Block Manager? I'm curious not only about what they do, but what their relationship is to the rest of the system. I find that I get

RE: Request for help in writing to Textfile

2014-08-25 Thread Liu, Raymond
You can try to manipulate the string you want to output before saveAsTextFile, something like modify. flatMap(x=x).map{x= val s=x.toString s.subSequence(1,s.length-1) } Should have more optimized way. Best Regards, Raymond Liu -Original Message- From: yh18190

RE: About StorageLevel

2014-06-26 Thread Liu, Raymond
I think there is a shuffle stage involved. And the future count job will depends on the first job’s shuffle stages’s output data directly as long as it is still available. Thus it will be much faster. Best Regards, Raymond Liu From: tomsheep...@gmail.com [mailto:tomsheep...@gmail.com] Sent

RE: When does Spark switch from PROCESS_LOCAL to NODE_LOCAL or RACK_LOCAL?

2014-06-05 Thread Liu, Raymond
If some task have no locality preference, it will also show up as PROCESS_LOCAL, yet, I think we probably need to name it NO_PREFER to make it more clear. Not sure is this your case. Best Regards, Raymond Liu From: coded...@gmail.com [mailto:coded...@gmail.com] On Behalf Of Sung Hwan Chung

RE: yarn-client mode question

2014-05-21 Thread Liu, Raymond
Seems you are asking that does spark related jar need to be deploy to yarn cluster manually before you launch application? Then, no , you don't, just like other yarn application. And it doesn't matter it is yarn-client or yarn-cluster mode.. Best Regards, Raymond Liu -Original Message

RE: different in spark on yarn mode and standalone mode

2014-05-04 Thread Liu, Raymond
of resource scheduling go through the same process, say between driver and executor through akka actor. Best Regards, Raymond Liu -Original Message- From: Sophia [mailto:sln-1...@163.com] Hey you guys, What is the different in spark on yarn mode and standalone mode about resource schedule

RE: How fast would you expect shuffle serialize to be?

2014-04-30 Thread Liu, Raymond
. So it seems to me that when running the full path code in my previous case, 32 core with 50MB/s total throughput are reasonable? Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] Later case, total throughput aggregated from all cores

RE: Shuffle Spill Issue

2014-04-29 Thread Liu, Raymond
Spill ( memory ) is abnormal, and sounds to me should not trigger at all. And, by the way, this behavior only occurs in map out side, on reduce / shuffle fetch side, this strange behavior won't happen. Best Regards, Raymond Liu From: Daniel Darabos [mailto:daniel.dara...@lynxanalytics.com] I

How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
to you? If not, anything might possible need to be examined in my case? Best Regards, Raymond Liu

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Is this the serialization throughput per task or the serialization throughput for all the tasks? On Tue, Apr 29, 2014 at 9:34 PM, Liu, Raymond

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
directly instead of read from HDFS, similar throughput result) Best Regards, Raymond Liu -Original Message- From: Liu, Raymond [mailto:raymond@intel.com] For all the tasks, say 32 task on total Best Regards, Raymond Liu -Original Message- From: Patrick Wendell

RE: How fast would you expect shuffle serialize to be?

2014-04-29 Thread Liu, Raymond
Later case, total throughput aggregated from all cores. Best Regards, Raymond Liu -Original Message- From: Patrick Wendell [mailto:pwend...@gmail.com] Sent: Wednesday, April 30, 2014 1:22 PM To: user@spark.apache.org Subject: Re: How fast would you expect shuffle serialize to be? Hm

Shuffle Spill Issue

2014-04-28 Thread Liu, Raymond
. So anyone encounter this issue? By the way, I am using latest trunk code. Best Regards, Raymond Liu

RE: Shuffle Spill Issue

2014-04-28 Thread Liu, Raymond
Regards, Raymond Liu From: Patrick Wendell [mailto:pwend...@gmail.com] Could you explain more what your job is doing and what data types you are using? These numbers alone don't necessarily indicate something is wrong. The relationship between the in-memory and on-disk shuffle amount