Re: Convert each partition of RDD to Dataframe

2020-02-27 Thread prosp4300
Looks no obvious relationship between the partition or tables, maybe try make them in different jobs, so they could run at same time to fully make use of the cluster resource. | | prosp4300 邮箱:prosp4...@163.com | Signature is customized by Netease Mail Master On 02/27/2020 22:50, Manjunath

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300
>classpath; the application jar is not in the system classpath, so that >does not work. There are different ways for you to get it there, most >of them manual (YARN is, I think, the only RM supported in Spark where >the application itself can do it). > >On Thu, Dec 20, 2018 at 1:48 PM pros

Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300
Hi, Spark Users I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar It works for driver instance, but once

Spray Client VS PlayWS vs Spring RestTemplate within Spark Job

2016-09-06 Thread prosp4300
Hi, Spark Users As I know, Spray Client depends on Akka ActorSystem, is this dependency theoretically means it is not possible to use spray-client in Spark Job which run from Spark Executor nodes I believe PlayWS should works as a Restful client to run from Spark Executor, how about

Re:Do we still need to use Kryo serializer in Spark 1.6.2 ?

2016-08-23 Thread prosp4300
The way to use Kryo serializer is similar as Scala, like below, the only different is lack of convenient method "conf.registerKryoClasses", but it should be easy to make one by yourself conf=SparkConf() conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

Re:Log rollover in spark streaming jobs

2016-08-23 Thread prosp4300
Spark on Yarn by default support customized log4j configuration, RollingFileAppender could be used to avoid disk overflow as documented below If you need a reference to the proper location to put log files in the YARN so that YARN can properly display and aggregate them, use

Re:Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300
The page mentioned before is the release notes that miss the links http://spark.apache.org/releases/spark-release-2-0-0.html#mllib At 2016-07-27 15:56:00, "prosp4300" <prosp4...@163.com> wrote: Additionally, in the paragraph about MLlib, three links missed, it is be

Re:Re: [ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300
Additionally, in the paragraph about MLlib, three links missed, it is better to provide the links to give us more information, thanks a lot See this blog post for details See this talk to learn more This talk lists many of these new features. 在 2016-07-27 15:18:41,"Ofir Manor"

Re:Re: ORC v/s Parquet for Spark 2.0

2016-07-27 Thread prosp4300
Thanks for this immediate correction :) 在 2016-07-27 15:17:54,"Gourav Sengupta" 写道: Sorry, in my email above I was referring to KUDU, and there is goes how can KUDU be right if it is mentioned in forums first with a wrong spelling. Its got a difficult beginning

Re:[ANNOUNCE] Announcing Apache Spark 2.0.0

2016-07-27 Thread prosp4300
Congratulations! 在 2016-07-27 14:00:22,"Reynold Xin" 写道: Hi all, Apache Spark 2.0.0 is the first release of Spark 2.x line. It includes 2500+ patches from 300+ contributors. To download Spark 2.0, head over to the download page:

Re:Re: RE: Error not found value sqlContext

2015-11-23 Thread prosp4300
etching data from an RDBMS by implementing JDBCRDD I tried couple of DataFrame related methods for which most of them errors stating that method has been overloaded Please let me know if any further inputs needed to analyze it Regards, Satish Chandra On Fri, Nov 20, 2015 at 5:46 PM, prosp4300 <pros

回复:Does spark sql support column indexing

2015-08-19 Thread prosp4300
The answer is simply NO, But I hope someone could give more deep insight or any meaningful reference 在2015年08月19日 15:21,Todd 写道: I don't find related talk on whether spark sql supports column indexing. If it does, is there guide how to do it? Thanks.

回复:Spark DataFrames uses too many partition

2015-08-13 Thread prosp4300
Hi, I want to know how you coalesce the partition to one to improve the performance Thanks 在2015年08月11日 23:31,Al M 写道: I am using DataFrames with Spark 1.4.1. I really like DataFrames but the partitioning makes no sense to me. I am loading lots of very small files and joining them together.

Re:SparkSQL 1.4 can't accept registration of UDF?

2015-07-14 Thread prosp4300
What's the result of list jar in both 1.3.1 and 1.4.0, please check if there is any difference At 2015-07-15 08:10:44, ogoh oke...@gmail.com wrote: Hello, I am using SparkSQL along with ThriftServer so that we can access using Hive queries. With Spark 1.3.1, I can register UDF function.

Re:Spark query

2015-07-08 Thread prosp4300
As mentioned in Spark sQL programming guide, Spark SQL support Hive UDFs, please take a look below builtin UDFs of Hive, get day of year should be as simply as existing RDBMS https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions At 2015-07-09

回复:HiveContext throws org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

2015-07-07 Thread prosp4300
Hi, bdev Derby is the default embedded DB for Hive MetaStore if you do not specify a hive.metastore.uris, please take a look at the lib directory of hive, you can find out derby jar there, Spark does not require derby by default At 2015-07-07 17:07:28, bdev buntu...@gmail.com wrote: Just

Re:Maintain Persistent Connection with Hive meta store

2015-07-07 Thread prosp4300
Each time you run the jar, a new JVM will be started, maintain connection between different JVM is not a correct way to think of each time when I run that jar it tries to make connection with hive metastore At 2015-07-07 17:07:06, wazza rajeshkumarit8...@gmail.com wrote: Hi I am new to

Re: Performance tuning in Spark SQL.

2015-07-02 Thread prosp4300
Please see below link for the ways available https://spark.apache.org/docs/1.3.1/sql-programming-guide.html#performance-tuning For example, reduce spark.sql.shuffle.partitions from 200 to 10 could improve the performance significantly -- View this message in context:

DataFrame registerTempTable Concurrent Access

2015-06-30 Thread prosp4300
not functional programming, correct? Thanks a lot prosp4300