No overwrite flag for saveAsXXFile

2015-03-06 Thread Jeff Zhang
Hi folks, I found that RDD:saveXXFile has no overwrite flag which I think is very helpful. Is there any reason for this ? -- Best Regards Jeff Zhang

Re: Is the RDD's Partitions determined before hand ?

2015-03-03 Thread Jeff Zhang
of RDDs, possibly requiring a shuffle. On Tue, Mar 3, 2015 at 10:21 AM, Jeff Zhang zjf...@gmail.com wrote: I mean is it possible to change the partition number at runtime. Thanks -- Best Regards Jeff Zhang -- Best Regards Jeff Zhang

Re: Is the RDD's Partitions determined before hand ?

2015-03-04 Thread Jeff Zhang
of execution slots). If you know a stage needs unusually high parallelism for example you can repartition further for that stage. On Mar 4, 2015 1:50 AM, Jeff Zhang zjf...@gmail.com wrote: Thanks Sean. But if the partitions of RDD is determined before hand, it would not be flexible to run

Re: Filter operation to return two RDDs at once.

2015-06-03 Thread Jeff Zhang
? val (qtSessionsWithQt, guidUidMapSessions) = rawQtSession. *magicFilter*(_._2.qualifiedTreatmentId != NULL_VALUE) -- Deepak -- Best Regards Jeff Zhang

Re: Filter operation to return two RDDs at once.

2015-06-03 Thread Jeff Zhang
:32 AM Jeff Zhang zjf...@gmail.com wrote: As far as I know, spark don't support multiple outputs On Wed, Jun 3, 2015 at 2:15 PM, ayan guha guha.a...@gmail.com wrote: Why do you need to do that if filter and content of the resulting rdd are exactly same? You may as well declare them as 1 RDD

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread Jeff Zhang
, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang

Re: Why does driver transfer application jar to executors?

2015-06-17 Thread Jeff Zhang
with TaskDescription. Regards. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang

Re: Transformation not happening for reduceByKey or GroupByKey

2015-08-21 Thread Jeff Zhang
local --jars postgresql-9.4-1201.jar -i ScriptFile Please let me know what is missing in my code, as my resultant Array is empty Regards, Satish -- Best Regards Jeff Zhang

Re: SparkPi is geting java.lang.NoClassDefFoundError: scala/collection/Seq

2015-08-16 Thread Jeff Zhang
$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 6 more Process finished with exit code 1 Thanks, Xiaohe -- Best Regards Jeff Zhang

Re: Spark Master HA on YARN

2015-08-16 Thread Jeff Zhang
...@gmail.com wrote: Hi, Is Spark master high availability supported on YARN (yarn-client mode) analogous to https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability ? Thanks Bhaskie -- Best Regards Jeff Zhang

DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Jeff Zhang
] == Physical Plan == Scan JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1] -- Best Regards Jeff Zhang

Re: help plz! how to use zipWithIndex to each subset of a RDD

2015-07-29 Thread Jeff Zhang
Spark User List mailing list archive http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com. -- Best Regards Jeff Zhang

Re: How to control Spark Executors from getting Lost when using YARN client mode?

2015-08-04 Thread Jeff Zhang
at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Thanks Regards, Ashwin Giridharan -- Best Regards Jeff Zhang

Re: Spark on YARN

2015-07-30 Thread Jeff Zhang
Regards Jeff Zhang

Re: Always two tasks slower than others, and then job fails

2015-08-14 Thread Jeff Zhang
...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang

Re: Job is Failing automatically

2015-08-11 Thread Jeff Zhang
Regards Jeff Zhang

Re: Spark Job Hangs on our production cluster

2015-08-11 Thread Jeff Zhang
- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang

Re: Console log file of CoarseGrainedExecutorBackend

2015-07-16 Thread Jeff Zhang
By default it is in ${SPARK_HOME}/work/${APP_ID}/${EXECUTOR_ID} On Thu, Jul 16, 2015 at 3:43 PM, Tao Lu taolu2...@gmail.com wrote: Hi, Guys, Where can I find the console log file of CoarseGrainedExecutorBackend process? Thanks! Tao -- Best Regards Jeff Zhang

Re: DataFrame#show cost 2 Spark Jobs ?

2015-08-24 Thread Jeff Zhang
provide the schema while loading the json file, like below: sqlContext.read.schema(xxx).json(“…”)? Hao *From:* Jeff Zhang [mailto:zjf...@gmail.com] *Sent:* Monday, August 24, 2015 6:20 PM *To:* user@spark.apache.org *Subject:* DataFrame#show cost 2 Spark Jobs ? It's weird to me

Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jeff Zhang
that I don't know. -- Best Regards Jeff Zhang

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jeff Zhang
n use the RDD#union() (or ++) method to concatenate >> multiple rdds. For example: >> >> val lines1 = sc.textFile("file1") >> val lines2 = sc.textFile("file2") >> >> val rdd = lines1 union lines2 >> >> regards, >> --Jakob >>

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-12 Thread Jeff Zhang
Didn't notice that I can pass comma separated path in the existing API (SparkContext#textFile). So no necessary for new api. Thanks all. On Thu, Nov 12, 2015 at 10:24 AM, Jeff Zhang <zjf...@gmail.com> wrote: > Hi Pradeep > > ≥≥≥ Looks like what I was suggesting doesn't work. :

Re: No spark examples jar in maven repository after 1.1.1 ?

2015-11-16 Thread Jeff Zhang
e they're not a library? they're example code, not > something you build an app on. > > On Mon, Nov 16, 2015 at 9:27 AM, Jeff Zhang <zjf...@gmail.com> wrote: > > I don't find spark examples jar in maven repository after 1.1.1. Any > reason > > for that ? > > > > http:

No spark examples jar in maven repository after 1.1.1 ?

2015-11-16 Thread Jeff Zhang
I don't find spark examples jar in maven repository after 1.1.1. Any reason for that ? http://mvnrepository.com/artifact/org.apache.spark/spark-examples_2.10 -- Best Regards Jeff Zhang

Re: Why there's no api for SparkContext#textFiles to support multiple inputs ?

2015-11-11 Thread Jeff Zhang
> list. I haven't tried this, but I think you should just be able to do > sc.textFile("file1,file2,...") > > On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> I know these workaround, but wouldn't it be more convenient and >> straight

Re: ResultStage's parent stages only ShuffleMapStages?

2015-11-06 Thread Jeff Zhang
e, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: pyspark groupbykey throwing error: unpack requires a string argument of length 4

2015-10-19 Thread Jeff Zhang
dow,k.srch_adults_count, > k.srch_children_count,k.srch_room_count), > (k[0:54]))) > BB = B.groupByKey() > BB.take(1) > > > best fahad > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

How to use And Operator in filter (PySpark)

2015-10-21 Thread Jeff Zhang
I can do it in scala api, but not sure what's the syntax in pyspark. (Didn't find it in python api) Here's what I tried, both failed >>> df.filter(df.age>3 & df.name=="Andy").collect() >>> df.filter(df.age>3 and df.name=="Andy").collect() -- Best Regards Jeff Zhang

Re: Location preferences in pyspark?

2015-10-20 Thread Jeff Zhang
; > - Philip > > -- Best Regards Jeff Zhang

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-20 Thread Jeff Zhang
BTW, I think Json Parser should verify the json format at least when inferring the schema of json. On Wed, Oct 21, 2015 at 12:59 PM, Jeff Zhang <zjf...@gmail.com> wrote: > I think this is due to the json file format. DataFrame can only accept > json file with one valid rec

Re: Reading JSON in Pyspark throws scala.MatchError

2015-10-20 Thread Jeff Zhang
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457) > > at > > > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418) > > at > org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > > > > It seems like this issue has been resolved in scala per SPARK-3390 > > <https://issues.apache.org/jira/browse/SPARK-3390> ; any thoughts on > the > > root cause of this in pyspark? > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Reading-JSON-in-Pyspark-throws-scala-MatchError-tp24911.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > - > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: The auxService:spark_shuffle does not exist

2015-07-07 Thread Jeff Zhang
. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org -- Best Regards Jeff Zhang

Re: Spark build error

2015-11-17 Thread Jeff Zhang
.Symbols$ClassSymbol.companionModule(Symbols.scala:2991) >> at >> scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371) >> at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120) >> at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583) >> at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557) >> at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553) >> at scala.tools.nsc.Global$Run.compile(Global.scala:1662) >> at xsbt.CachedCompiler0.run(CompilerInterface.scala:126) >> at xsbt.CachedCompiler0.run(CompilerInterface.scala:102) >> at xsbt.CompilerInterface.run(CompilerInterface.scala:27) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102) >> at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48) >> at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcV$sp(AggressiveCompile.scala:106) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply(AggressiveCompile.scala:106) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply(AggressiveCompile.scala:106) >> at >> sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:179) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3.apply(AggressiveCompile.scala:105) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3.apply(AggressiveCompile.scala:102) >> at scala.Option.foreach(Option.scala:245) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:102) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:102) >> at scala.Option.foreach(Option.scala:245) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6.compileScala$1(AggressiveCompile.scala:102) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6.apply(AggressiveCompile.scala:151) >> at >> sbt.compiler.AggressiveCompile$$anonfun$6.apply(AggressiveCompile.scala:89) >> at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:40) >> at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:38) >> at sbt.inc.IncrementalCommon.cycle(Incremental.scala:103) >> at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:39) >> at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:38) >> at sbt.inc.Incremental$.manageClassfiles(Incremental.scala:69) >> at sbt.inc.Incremental$.compile(Incremental.scala:38) >> at sbt.inc.IncrementalCompile$.apply(Compile.scala:28) >> at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:170) >> at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:73) >> at >> org.jetbrains.jps.incremental.scala.local.SbtCompiler.compile(SbtCompiler.scala:66) >> at >> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26) >> at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:62) >> at >> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:20) >> at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala) >> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:606) >> at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319) >> >> I just highlighted some error message that I think important as *bold >> and red.* >> >> This really bothered me for several days, I don't know how to get >> through. Any suggestions? Thanks. >> > > -- Best Regards Jeff Zhang

Re: Exception throws when running spark pi in Intellij Idea that scala.collection.Seq is not found

2015-08-25 Thread Jeff Zhang
Jeff Zhang

Re: Submitted applications does not run.

2015-09-01 Thread Jeff Zhang
anything abnormal in logs. What would be the reason for not > availability of executors? > > On 1 September 2015 at 12:24, Madawa Soysa <madawa...@cse.mrt.ac.lk> > wrote: > >> Following are the logs available. Please find the attached. >> >> On 1 September 20

Re: Submitted applications does not run.

2015-09-01 Thread Jeff Zhang
gt; On 1 September 2015 at 12:05, Jeff Zhang <zjf...@gmail.com> wrote: > >> No executors ? Please check the worker logs if you are using spark >> standalone mode. >> >> On Tue, Sep 1, 2015 at 2:17 PM, Madawa Soysa <madawa...@cse.mrt.ac.lk> >> wrote:

Re: Submitted applications does not run.

2015-09-01 Thread Jeff Zhang
- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > -- Best Regards Jeff Zhang

Re: cached data between jobs

2015-09-01 Thread Jeff Zhang
se scenario? > > (It's also possible that I've simply changed something that made things > faster.) > > Eric > > -- Best Regards Jeff Zhang

Re: Submitted applications does not run.

2015-09-01 Thread Jeff Zhang
-jzhangMBPr.local.out On Tue, Sep 1, 2015 at 4:01 PM, Madawa Soysa <madawa...@cse.mrt.ac.lk> wrote: > There are no logs which includes apache.spark.deploy.worker in file name > in the SPARK_HOME/logs folder. > > On 1 September 2015 at 13:00, Jeff Zhang <zjf...@gmail.com> wrote: > >

Re: Submitted applications does not run.

2015-09-01 Thread Jeff Zhang
gt; When I used ./sbin/start-all.sh the start fails. I get the following error. > > failed to launch org.apache.spark.deploy.master.Master: > localhost: ssh: connect to host localhost port 22: Connection refused > > On 1 September 2015 at 13:41, Jeff Zhang <zjf...@gmail.com> wro

Re: Event logging not working when worker machine terminated

2015-09-08 Thread Jeff Zhang
--- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Task serialization error for mllib.MovieLensALS

2015-09-09 Thread Jeff Zhang
.java:1178) java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) scala.collection.immutable.$colon$colon.writeObject(List.scala:379) -- Best Regards Jeff Zhang

Re: Setting executors per worker - Standalone

2015-09-28 Thread Jeff Zhang
nd in the standalone mode I can just set > "SPARK_WORKER_INSTANCES" and "SPARK_WORKER_CORES" and "SPARK_WORKER_MEMORY". > > Any hint or suggestion would be great. > > -- Best Regards Jeff Zhang

Re: sparkSQL Load multiple tables

2015-12-02 Thread Jeff Zhang
> Dear all, > Can you tell me how did get past SQLContext load function read multiple > tables? > > > > -- Best Regards Jeff Zhang

Re: how to skip headers when reading multiple files

2015-12-02 Thread Jeff Zhang
irement to read and process multiple text files with > headers using DataFrame API . > How can I skip headers when processing data with DataFrame API > > Thanks in advance . > Regards, > Divya > > -- Best Regards Jeff Zhang

Re: Can't create UDF's in spark 1.5 while running using the hive thrift service

2015-12-08 Thread Jeff Zhang
sk.java:266) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> at java.lang.Thread.run(Thread.java:745) >> >> >> >> When I ran the same against 1.4 it worked. >> >> I've also changed the spark.sql.hive.metastore.version version to be 0.13 >> (similar to what it was in 1.4) and 0.14 but I still get the same errors. >> >> >> Any suggestions? >> >> Thanks, >> Trystan >> >> > -- Best Regards Jeff Zhang

Re: SparkSQL API to insert DataFrame into a static partition?

2015-12-01 Thread Jeff Zhang
l to use dynamic partitioning function for such a case. > > > Thanks for any pointers! > > Isabelle > > > -- Best Regards Jeff Zhang

Re: Access row column by field name

2015-12-16 Thread Jeff Zhang
er/hadoop/incidents/unstructured/inc-0-500.txt") > val df = sqlContext.jsonRDD(rawIncRdd) > df.foreach(line => println(line.getString(*"field_name"*))) > > thanks for the advice > -- Best Regards Jeff Zhang

Re: [SparkR] Is rdd in SparkR deprecated ?

2015-12-14 Thread Jeff Zhang
ve for them? Would the DataFrame API sufficient? > > > > > > On Mon, Dec 14, 2015 at 4:26 AM -0800, "Jeff Zhang" <zjf...@gmail.com> > wrote: > > From the source code of SparkR, seems SparkR support rdd api. But there's > no documentation on that.

[SparkR] Is rdd in SparkR deprecated ?

2015-12-14 Thread Jeff Zhang
>From the source code of SparkR, seems SparkR support rdd api. But there's no documentation on that. ( http://spark.apache.org/docs/latest/sparkr.html ) So I guess it is deprecated, is that right ? -- Best Regards Jeff Zhang

Re: how to make a dataframe of Array[Doubles] ?

2015-12-14 Thread Jeff Zhang
-- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Database does not exist: (Spark-SQL ===> Hive)

2015-12-14 Thread Jeff Zhang
base > does not exist: test_db > 15/12/14 18:49:57 ERROR HiveContext: > == > HIVE FAILURE OUTPUT > == > > > > >OK > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. Database does not exist: test_db > > == > END HIVE FAILURE OUTPUT > == > > > Process finished with exit code 0 > > Thanks & Regards, > Gokula Krishnan* (Gokul)* > -- Best Regards Jeff Zhang

Re: hiveContext: storing lookup of partitions

2015-12-16 Thread Jeff Zhang
ivemetastore to start running queries on it (the one with > .count() or .show()) then it takes around 2 hours before the job starts in > SPARK. > > On the pyspark screen I can see that it is parsing the S3 locations for > these 2 hours. > > Regards, > Gourav > > On Wed

Re: YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

2015-12-15 Thread Jeff Zhang
rrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) > > at scala.concurrent.Await$.result(package.scala:107) > > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102) > > at > org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78) > > at > org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257) > > > -- Best Regards Jeff Zhang

Re: Spark assembly in Maven repo?

2015-12-10 Thread Jeff Zhang
mbly jar? Or should we create > & upload it to Maven central? > > > > Thanks! > > > > Xiaoyong > > > -- Best Regards Jeff Zhang

[SparkR] Any reason why saveDF's mode is append by default ?

2015-12-13 Thread Jeff Zhang
It is inconsistent with scala api which is error by default. Any reason for that ? Thanks -- Best Regards Jeff Zhang

Re: [SparkR] Any reason why saveDF's mode is append by default ?

2015-12-14 Thread Jeff Zhang
iginal PR [1]) but the Python API seems to have been > changed to match Scala / Java in > https://issues.apache.org/jira/browse/SPARK-6366 > > Feel free to open a JIRA / PR for this. > > Thanks > Shivaram > > [1] https://github.com/amplab-extras/SparkR-pkg/pull/199/files > >

Re: Can't create UDF through thriftserver, no error reported

2015-12-15 Thread Jeff Zhang
ld post to support > troubleshooting? Is this JIRA-worthy? Thanks > > Antonio > > > > -- Best Regards Jeff Zhang

Re: hiveContext: storing lookup of partitions

2015-12-15 Thread Jeff Zhang
the partition lookups? > > Currently it takes around 1.5 hours for me just to cache in the partition > information and after that I can see that the job gets queued in the SPARK > UI. > > Regards, > Gourav > -- Best Regards Jeff Zhang

Re: sql:Exception in thread "main" scala.MatchError: StringType

2016-01-03 Thread Jeff Zhang
} > ] > } > > ___ > Exception in thread "main" scala.MatchError: StringType (of class > org.apache.spark.sql.types.StringType$) > at > org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58) > at > > org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139) > > ___ > why > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/sql-Exception-in-thread-main-scala-MatchError-StringType-tp25868.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Can spark.scheduler.pool be applied globally ?

2016-01-05 Thread Jeff Zhang
. Is there anyway to do that ? Or do I miss anything here ? -- Best Regards Jeff Zhang

Re: Cannot get repartitioning to work

2016-01-01 Thread Jeff Zhang
ubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Can spark.scheduler.pool be applied globally ?

2016-01-05 Thread Jeff Zhang
actually be SchedulingMode.FIFO > if you haven't changed the code: > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L65 > > On Tue, Jan 5, 2016 at 5:29 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> Right,

Re: Need Help in Spark Hive Data Processing

2016-01-06 Thread Jeff Zhang
es. > > > I have 5 node Spark cluster each with 30 GB memory. i am want to process > hive table with 450GB data using DataFrames. To fetch single row from Hive > table its taking 36 mins. Pls suggest me what wrong here and any help is > appreciated. > > > Thanks > Bala > > > -- Best Regards Jeff Zhang

Re: Can spark.scheduler.pool be applied globally ?

2016-01-05 Thread Jeff Zhang
park/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L90 > ). > > > On Tue, Jan 5, 2016 at 4:15 PM, Jeff Zhang <zjf...@gmail.com> wrote: > >> Sorry, I don't make it clearly. What I want is the default pool is fair >>

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread Jeff Zhang
월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente < >>>>>>>>>>> ju...@esbet.es>님이 작성: >>>>>>>>>>> >>>>>>>>>>>> Unfortunately, Koert is right. >>>>>>>>>>>> >>>>>>>>>>>> I've been in a couple of projects using Spark (banking >>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available. >>>>>>>>>>>> >>>>>>>>>>>> That said, I believe it should not be a concern for Spark. >>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the >>>>>>>>>>>> Spark >>>>>>>>>>>> philosophy IMO. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com> >>>>>>>>>>>> escribió: >>>>>>>>>>>> >>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it? >>>>>>>>>>>> >>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6 >>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work >>>>>>>>>>>> >>>>>>>>>>>> so i think its a bad idea >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland < >>>>>>>>>>>> juliet.hougl...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python >>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is >>>>>>>>>>>>> encouraged. >>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging >>>>>>>>>>>>> behind the version they should theoretically use. Dropping python >>>>>>>>>>>>> 2.6 >>>>>>>>>>>>> support sounds very reasonable to me. >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas < >>>>>>>>>>>>> nicholas.cham...@gmail.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> +1 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020 >>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>, >>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core >>>>>>>>>>>>>> Python >>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good >>>>>>>>>>>>>> enough >>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I >>>>>>>>>>>>>> believe we currently do). >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nick >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang < >>>>>>>>>>>>>> allenzhang...@126.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> plus 1, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> we are currently using python 2.7.2 in production >>>>>>>>>>>>>>> environment. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" < >>>>>>>>>>>>>>> meethu.mat...@flytxt.com> 写道: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> +1 >>>>>>>>>>>>>>> We use Python 2.7 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Meethu Mathew >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin < >>>>>>>>>>>>>>> r...@databricks.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python >>>>>>>>>>>>>>>> 2.6 in Spark 2.0? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects >>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some >>>>>>>>>>>>>>>> libraries that >>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince >>>>>>>>>>>>>>>> the library >>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm >>>>>>>>>>>>>>>> curious if >>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>> >>> >>> >> > -- Best Regards Jeff Zhang

Re: Can spark.scheduler.pool be applied globally ?

2016-01-05 Thread Jeff Zhang
're using fair scheduling and don't set a pool, > the default pool will be used. > > On Tue, Jan 5, 2016 at 1:57 AM, Jeff Zhang <zjf...@gmail.com> wrote: > >> >> It seems currently spark.scheduler.pool must be set as localProperties >> (associate with thr

Re: DataFrame operations

2015-12-20 Thread Jeff Zhang
;hse") return a column not the row data > What am I missing here? > -- Best Regards Jeff Zhang

Re: Spark batch getting hung up

2015-12-20 Thread Jeff Zhang
fore it gets cleared up. > > Would the driver not wait till all the stuff related to test1 is completed > before calling test2 as test2 is dependent on test1? > > val test1 =RDD1.mapPartitions.() > > val test2 = test1.mapPartititions() > > On Sat, Dec 19, 2015 at 12:24 AM, Jeff Zh

Re: should I file a bug? Re: trouble implementing Transformer and calling DataFrame.withColumn()

2015-12-22 Thread Jeff Zhang
Exception { > > logger.info("AEDWIP s:{}", s); > > String ret = s.equalsIgnoreCase(category1) ? category1 : > category3; > > return ret; > > } > > } > > > public class Features implements Serializab

Re: Missing dependencies when submitting scala app

2015-12-22 Thread Jeff Zhang
efor can't find the parse method > > Any idea on how to solve this depdendency problem? > > thanks in advance > -- Best Regards Jeff Zhang

Re: Passing parameters to spark SQL

2015-12-27 Thread Jeff Zhang
@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Opening Dynamic Scaling Executors on Yarn

2015-12-27 Thread Jeff Zhang
jar and restart all the namenode on yarn ? > > > > Thanks a lot. > > > > Mars > > > -- Best Regards Jeff Zhang

Re: Can anyone explain Spark behavior for below? Kudos in Advance

2015-12-27 Thread Jeff Zhang
t;345",""),2) > z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x > + y) > res143: String = 10 > > Scenario2: > val z = sc.parallelize(List("12","23","","345"),2) > z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x > + y) > res144: String = 11 > > why the result is different . I was expecting 10 for both. also for the > first Partition > -- Best Regards Jeff Zhang

Re: Is there anyway to log properties from a Spark application

2015-12-28 Thread Jeff Zhang
che Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Is there anyway to log properties from a Spark application

2015-12-28 Thread Jeff Zhang
'm launching my applications through YARN. Where will these properties be > logged?. I guess they wont be part of YARN logs > > 2015-12-28 13:22 GMT+01:00 Jeff Zhang <zjf...@gmail.com>: > >> set spark.logConf as true in spark-default.conf will log the property in >> driver sid

Re: spark-submit for dependent jars

2015-12-21 Thread Jeff Zhang
he.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) > at > org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > > Regards, > Rajesh > -- Best Regards Jeff Zhang

Re: spark-submit for dependent jars

2015-12-21 Thread Jeff Zhang
t com.cisco.ss.etl.Main.main(Main.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at >>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >>> at >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:606) >>> at >>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672) >>> at >>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> Regards, >>> Rajesh >>> >> >> > -- Best Regards Jeff Zhang

Re: trouble implementing Transformer and calling DataFrame.withColumn()

2015-12-21 Thread Jeff Zhang
super(f, dataType, inputTypes); > > ??? Why do I have to implement this constructor ??? > > ??? What are the arguments ??? > > } > > > > @Override > > public > > Column apply(scala.collection.Seq exprs) { > > What do you do with a scala seq? > > return ???; > > } > > } > > } > > > -- Best Regards Jeff Zhang

Re: get parameters of spark-submit

2015-12-21 Thread Jeff Zhang
nsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Spark batch getting hung up

2015-12-19 Thread Jeff Zhang
g-up-tp25735.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Dynamic jar loading

2015-12-19 Thread Jeff Zhang
; > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > > -- Best Regards Jeff Zhang

Re: Base ERROR

2015-12-17 Thread Jeff Zhang
1:24:29,942 INFO [Thread-6] regionserver.ShutdownHook: > Starting fs shutdown hook thread. > 2015-12-17 21:24:29,953 INFO [Thread-6] regionserver.ShutdownHook: > Shutdown hook finished. > > > -- Best Regards Jeff Zhang

Re: Stop Spark yarn-client job

2015-11-26 Thread Jeff Zhang
ee the job still running (in > yarn resource manager) after final hive insert is complete. > > The code flow is > > start context > do somework > insert to hive > sc.stop > > This is sparkling water job is that matters. > > Is there anything else needed ? > > Thanks, > > J > > > -- Best Regards Jeff Zhang

Re: Adding new column to Dataframe

2015-11-25 Thread Jeff Zhang
gt; scala> df.withColumn("line2",df2("line")) > > org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330 > missing from line#2326 in operator !Project [line#2326,line#2330 AS > line2#2331]; > > ​ > > Thanks and Regards, > Vishnu Viswanath > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* > -- Best Regards Jeff Zhang

Re: General question on using StringIndexer in SparkML

2015-11-29 Thread Jeff Zhang
ta which I need to predict. >> And >> > suppose the new dataset has C as the most frequent word, followed by B >> and >> > A. So the StringIndexer will assign index as >> > >> > C 0.0 >> > B 1.0 >> > A 2.0 >> > >> > These indexes are different from what we used for modeling. So won’t >> this >> > give me a wrong prediction if I use StringIndexer? >> > >> > -- >> > Thanks and Regards, >> > Vishnu Viswanath, >> > www.vishnuviswanath.com >> > > > > -- > Thanks and Regards, > Vishnu Viswanath, > *www.vishnuviswanath.com <http://www.vishnuviswanath.com>* > -- Best Regards Jeff Zhang

No documentation for how to write custom Transformer in ml pipeline ?

2015-11-30 Thread Jeff Zhang
. Is this because the interface is still unstable now ? -- Best Regards Jeff Zhang

Re: Spark on yarn vs spark standalone

2015-11-26 Thread Jeff Zhang
data? > > Many thanks in advance for any responses. > > Cheers! > -- Best Regards Jeff Zhang

Re: Optimizing large collect operations

2015-11-26 Thread Jeff Zhang
t; > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Optimizing-large-collect-operations-tp25498.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: Millions of entities in custom Hadoop InputFormat and broadcast variable

2015-11-27 Thread Jeff Zhang
m performance perspective, what enhancement I can make > to make it better? > > Thanks > > -- > --Anfernee > -- Best Regards Jeff Zhang

Re: Re: driver ClassNotFoundException when MySQL JDBC exceptions are thrown on executor

2015-11-19 Thread Jeff Zhang
xception > at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49) > at repro.Repro$$anonfun$main$2.apply

Re: has any spark write orc document

2015-11-19 Thread Jeff Zhang
the parquet document. > > http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files > > Thanks > -- Best Regards Jeff Zhang

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
gt;> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional >> commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Regards, > Vipul Rai > www.vipulrai.me > +91-8892598819 > <http://in.linkedin.com/in/vipulrai/> > -- Best Regards Jeff Zhang

Re: SparkR DataFrame , Out of memory exception for very small file.

2015-11-23 Thread Jeff Zhang
gards, > Vipul​ > > On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote: > >> >>> Do I need to create a new DataFrame for every update to the >> DataFrame like >> addition of new column or need to update the original sales DataFrame

Re: java.io.FileNotFoundException

2016-06-03 Thread Jeff Zhang
/application_1463194314221_211370/spark-3cc37dc7-fa3c-4b98-aa60-0acdfc79c725/28/shuffle_8553_38_0.index >> (No such file or directory) >> >> any idea about this error ? >> -- >> Thanks, >> Kishore. >> > > > > -- > Thanks, > Kishore. > -- Best Regards Jeff Zhang

Re: Spark corrupts text lines

2016-06-14 Thread Jeff Zhang
bscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang

Re: sqlcontext - not able to connect to database

2016-06-14 Thread Jeff Zhang
gt; at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:209) > at java.lang.Thread.run(Thread.java:745) > > > -- Best Regards Jeff Zhang

Bug of PolynomialExpansion ?

2016-05-29 Thread Jeff Zhang
, x2*x3, x3*x1, x3*x2,x3*x3) (3,[0,2],[1.0,1.0]) --> (9,[0,1,5,6,8],[1.0,1.0,1.0,1.0,1.0])| -- Best Regards Jeff Zhang

Re: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

2016-06-22 Thread Jeff Zhang
examples.SparkPi >> --master yarn-client --driver-memory 512m --num-executors 2 >> --executor-memory 512m --executor-cores 210: >> >> >> >>- Error: Could not find or load main class >>org.apache.spark.deploy.yarn.ExecutorLauncher >> >> but i don't config that para ,there no error why???that para is only >> avoid Uploading resource file(jar package)?? >> > > -- Best Regards Jeff Zhang

Re: Does saveAsHadoopFile depend on master?

2016-06-21 Thread Jeff Zhang
thoughts on how to track down > what is happening here? > > Thanks! > > Pierre. > -- Best Regards Jeff Zhang

  1   2   >