Hi folks,
I found that RDD:saveXXFile has no overwrite flag which I think is very
helpful. Is there any reason for this ?
--
Best Regards
Jeff Zhang
of RDDs, possibly requiring a shuffle.
On Tue, Mar 3, 2015 at 10:21 AM, Jeff Zhang zjf...@gmail.com wrote:
I mean is it possible to change the partition number at runtime. Thanks
--
Best Regards
Jeff Zhang
--
Best Regards
Jeff Zhang
of execution slots).
If you know a stage needs unusually high parallelism for example you can
repartition further for that stage.
On Mar 4, 2015 1:50 AM, Jeff Zhang zjf...@gmail.com wrote:
Thanks Sean.
But if the partitions of RDD is determined before hand, it would not be
flexible to run
?
val (qtSessionsWithQt, guidUidMapSessions) = rawQtSession.
*magicFilter*(_._2.qualifiedTreatmentId != NULL_VALUE)
--
Deepak
--
Best Regards
Jeff Zhang
:32 AM Jeff Zhang zjf...@gmail.com wrote:
As far as I know, spark don't support multiple outputs
On Wed, Jun 3, 2015 at 2:15 PM, ayan guha guha.a...@gmail.com wrote:
Why do you need to do that if filter and content of the resulting rdd
are exactly same? You may as well declare them as 1 RDD
,
Patcharee
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards
Jeff Zhang
with TaskDescription.
Regards.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards
Jeff Zhang
local --jars postgresql-9.4-1201.jar -i ScriptFile
Please let me know what is missing in my code, as my resultant Array is
empty
Regards,
Satish
--
Best Regards
Jeff Zhang
$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
Process finished with exit code 1
Thanks,
Xiaohe
--
Best Regards
Jeff Zhang
...@gmail.com wrote:
Hi,
Is Spark master high availability supported on YARN (yarn-client mode)
analogous to
https://spark.apache.org/docs/1.4.0/spark-standalone.html#high-availability
?
Thanks
Bhaskie
--
Best Regards
Jeff Zhang
]
== Physical Plan ==
Scan
JSONRelation[file:/Users/hadoop/github/spark/examples/src/main/resources/people.json][age#0L,name#1]
--
Best Regards
Jeff Zhang
Spark User List mailing list archive
http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.
--
Best Regards
Jeff Zhang
at Nabble.com.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Thanks Regards,
Ashwin Giridharan
--
Best Regards
Jeff Zhang
Regards
Jeff Zhang
...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards
Jeff Zhang
Regards
Jeff Zhang
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards
Jeff Zhang
By default it is in ${SPARK_HOME}/work/${APP_ID}/${EXECUTOR_ID}
On Thu, Jul 16, 2015 at 3:43 PM, Tao Lu taolu2...@gmail.com wrote:
Hi, Guys,
Where can I find the console log file of CoarseGrainedExecutorBackend
process?
Thanks!
Tao
--
Best Regards
Jeff Zhang
provide the schema while loading the json file, like below:
sqlContext.read.schema(xxx).json(“…”)?
Hao
*From:* Jeff Zhang [mailto:zjf...@gmail.com]
*Sent:* Monday, August 24, 2015 6:20 PM
*To:* user@spark.apache.org
*Subject:* DataFrame#show cost 2 Spark Jobs ?
It's weird to me
that I don't know.
--
Best Regards
Jeff Zhang
n use the RDD#union() (or ++) method to concatenate
>> multiple rdds. For example:
>>
>> val lines1 = sc.textFile("file1")
>> val lines2 = sc.textFile("file2")
>>
>> val rdd = lines1 union lines2
>>
>> regards,
>> --Jakob
>>
Didn't notice that I can pass comma separated path in the existing API
(SparkContext#textFile). So no necessary for new api. Thanks all.
On Thu, Nov 12, 2015 at 10:24 AM, Jeff Zhang <zjf...@gmail.com> wrote:
> Hi Pradeep
>
> ≥≥≥ Looks like what I was suggesting doesn't work. :
e they're not a library? they're example code, not
> something you build an app on.
>
> On Mon, Nov 16, 2015 at 9:27 AM, Jeff Zhang <zjf...@gmail.com> wrote:
> > I don't find spark examples jar in maven repository after 1.1.1. Any
> reason
> > for that ?
> >
> > http:
I don't find spark examples jar in maven repository after 1.1.1. Any reason
for that ?
http://mvnrepository.com/artifact/org.apache.spark/spark-examples_2.10
--
Best Regards
Jeff Zhang
> list. I haven't tried this, but I think you should just be able to do
> sc.textFile("file1,file2,...")
>
> On Wed, Nov 11, 2015 at 4:30 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> I know these workaround, but wouldn't it be more convenient and
>> straight
e, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
dow,k.srch_adults_count,
> k.srch_children_count,k.srch_room_count),
> (k[0:54])))
> BB = B.groupByKey()
> BB.take(1)
>
>
> best fahad
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
I can do it in scala api, but not sure what's the syntax in pyspark.
(Didn't find it in python api)
Here's what I tried, both failed
>>> df.filter(df.age>3 & df.name=="Andy").collect()
>>> df.filter(df.age>3 and df.name=="Andy").collect()
--
Best Regards
Jeff Zhang
;
> - Philip
>
>
--
Best Regards
Jeff Zhang
BTW, I think Json Parser should verify the json format at least when
inferring the schema of json.
On Wed, Oct 21, 2015 at 12:59 PM, Jeff Zhang <zjf...@gmail.com> wrote:
> I think this is due to the json file format. DataFrame can only accept
> json file with one valid rec
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1457)
> > at
> >
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1418)
> > at
> org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> >
> > It seems like this issue has been resolved in scala per SPARK-3390
> > <https://issues.apache.org/jira/browse/SPARK-3390> ; any thoughts on
> the
> > root cause of this in pyspark?
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Reading-JSON-in-Pyspark-throws-scala-MatchError-tp24911.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > -
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
.
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Best Regards
Jeff Zhang
.Symbols$ClassSymbol.companionModule(Symbols.scala:2991)
>> at
>> scala.tools.nsc.backend.jvm.GenASM$JPlainBuilder.genClass(GenASM.scala:1371)
>> at scala.tools.nsc.backend.jvm.GenASM$AsmPhase.run(GenASM.scala:120)
>> at scala.tools.nsc.Global$Run.compileUnitsInternal(Global.scala:1583)
>> at scala.tools.nsc.Global$Run.compileUnits(Global.scala:1557)
>> at scala.tools.nsc.Global$Run.compileSources(Global.scala:1553)
>> at scala.tools.nsc.Global$Run.compile(Global.scala:1662)
>> at xsbt.CachedCompiler0.run(CompilerInterface.scala:126)
>> at xsbt.CachedCompiler0.run(CompilerInterface.scala:102)
>> at xsbt.CompilerInterface.run(CompilerInterface.scala:27)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:102)
>> at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:48)
>> at sbt.compiler.AnalyzingCompiler.compile(AnalyzingCompiler.scala:41)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply$mcV$sp(AggressiveCompile.scala:106)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply(AggressiveCompile.scala:106)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3$$anonfun$apply$1.apply(AggressiveCompile.scala:106)
>> at
>> sbt.compiler.AggressiveCompile.sbt$compiler$AggressiveCompile$$timed(AggressiveCompile.scala:179)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3.apply(AggressiveCompile.scala:105)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1$$anonfun$apply$3.apply(AggressiveCompile.scala:102)
>> at scala.Option.foreach(Option.scala:245)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:102)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6$$anonfun$compileScala$1$1.apply(AggressiveCompile.scala:102)
>> at scala.Option.foreach(Option.scala:245)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6.compileScala$1(AggressiveCompile.scala:102)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6.apply(AggressiveCompile.scala:151)
>> at
>> sbt.compiler.AggressiveCompile$$anonfun$6.apply(AggressiveCompile.scala:89)
>> at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:40)
>> at sbt.inc.IncrementalCompile$$anonfun$doCompile$1.apply(Compile.scala:38)
>> at sbt.inc.IncrementalCommon.cycle(Incremental.scala:103)
>> at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:39)
>> at sbt.inc.Incremental$$anonfun$1.apply(Incremental.scala:38)
>> at sbt.inc.Incremental$.manageClassfiles(Incremental.scala:69)
>> at sbt.inc.Incremental$.compile(Incremental.scala:38)
>> at sbt.inc.IncrementalCompile$.apply(Compile.scala:28)
>> at sbt.compiler.AggressiveCompile.compile2(AggressiveCompile.scala:170)
>> at sbt.compiler.AggressiveCompile.compile1(AggressiveCompile.scala:73)
>> at
>> org.jetbrains.jps.incremental.scala.local.SbtCompiler.compile(SbtCompiler.scala:66)
>> at
>> org.jetbrains.jps.incremental.scala.local.LocalServer.compile(LocalServer.scala:26)
>> at org.jetbrains.jps.incremental.scala.remote.Main$.make(Main.scala:62)
>> at
>> org.jetbrains.jps.incremental.scala.remote.Main$.nailMain(Main.scala:20)
>> at org.jetbrains.jps.incremental.scala.remote.Main.nailMain(Main.scala)
>> at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>> at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at com.martiansoftware.nailgun.NGSession.run(NGSession.java:319)
>>
>> I just highlighted some error message that I think important as *bold
>> and red.*
>>
>> This really bothered me for several days, I don't know how to get
>> through. Any suggestions? Thanks.
>>
>
>
--
Best Regards
Jeff Zhang
Jeff Zhang
anything abnormal in logs. What would be the reason for not
> availability of executors?
>
> On 1 September 2015 at 12:24, Madawa Soysa <madawa...@cse.mrt.ac.lk>
> wrote:
>
>> Following are the logs available. Please find the attached.
>>
>> On 1 September 20
gt; On 1 September 2015 at 12:05, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> No executors ? Please check the worker logs if you are using spark
>> standalone mode.
>>
>> On Tue, Sep 1, 2015 at 2:17 PM, Madawa Soysa <madawa...@cse.mrt.ac.lk>
>> wrote:
-
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
--
Best Regards
Jeff Zhang
se scenario?
>
> (It's also possible that I've simply changed something that made things
> faster.)
>
> Eric
>
>
--
Best Regards
Jeff Zhang
-jzhangMBPr.local.out
On Tue, Sep 1, 2015 at 4:01 PM, Madawa Soysa <madawa...@cse.mrt.ac.lk>
wrote:
> There are no logs which includes apache.spark.deploy.worker in file name
> in the SPARK_HOME/logs folder.
>
> On 1 September 2015 at 13:00, Jeff Zhang <zjf...@gmail.com> wrote:
>
>
gt; When I used ./sbin/start-all.sh the start fails. I get the following error.
>
> failed to launch org.apache.spark.deploy.master.Master:
> localhost: ssh: connect to host localhost port 22: Connection refused
>
> On 1 September 2015 at 13:41, Jeff Zhang <zjf...@gmail.com> wro
---
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
.java:1178)
java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
scala.collection.immutable.$colon$colon.writeObject(List.scala:379)
--
Best Regards
Jeff Zhang
nd in the standalone mode I can just set
> "SPARK_WORKER_INSTANCES" and "SPARK_WORKER_CORES" and "SPARK_WORKER_MEMORY".
>
> Any hint or suggestion would be great.
>
>
--
Best Regards
Jeff Zhang
> Dear all,
> Can you tell me how did get past SQLContext load function read multiple
> tables?
>
>
>
>
--
Best Regards
Jeff Zhang
irement to read and process multiple text files with
> headers using DataFrame API .
> How can I skip headers when processing data with DataFrame API
>
> Thanks in advance .
> Regards,
> Divya
>
>
--
Best Regards
Jeff Zhang
sk.java:266)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>>
>> When I ran the same against 1.4 it worked.
>>
>> I've also changed the spark.sql.hive.metastore.version version to be 0.13
>> (similar to what it was in 1.4) and 0.14 but I still get the same errors.
>>
>>
>> Any suggestions?
>>
>> Thanks,
>> Trystan
>>
>>
>
--
Best Regards
Jeff Zhang
l to use dynamic partitioning function for such a case.
>
>
> Thanks for any pointers!
>
> Isabelle
>
>
>
--
Best Regards
Jeff Zhang
er/hadoop/incidents/unstructured/inc-0-500.txt")
> val df = sqlContext.jsonRDD(rawIncRdd)
> df.foreach(line => println(line.getString(*"field_name"*)))
>
> thanks for the advice
>
--
Best Regards
Jeff Zhang
ve for them? Would the DataFrame API sufficient?
>
>
>
>
>
> On Mon, Dec 14, 2015 at 4:26 AM -0800, "Jeff Zhang" <zjf...@gmail.com>
> wrote:
>
> From the source code of SparkR, seems SparkR support rdd api. But there's
> no documentation on that.
>From the source code of SparkR, seems SparkR support rdd api. But there's
no documentation on that. ( http://spark.apache.org/docs/latest/sparkr.html
) So I guess it is deprecated, is that right ?
--
Best Regards
Jeff Zhang
--
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
base
> does not exist: test_db
> 15/12/14 18:49:57 ERROR HiveContext:
> ==
> HIVE FAILURE OUTPUT
> ==
>
>
>
>
>OK
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask. Database does not exist: test_db
>
> ==
> END HIVE FAILURE OUTPUT
> ==
>
>
> Process finished with exit code 0
>
> Thanks & Regards,
> Gokula Krishnan* (Gokul)*
>
--
Best Regards
Jeff Zhang
ivemetastore to start running queries on it (the one with
> .count() or .show()) then it takes around 2 hours before the job starts in
> SPARK.
>
> On the pyspark screen I can see that it is parsing the S3 locations for
> these 2 hours.
>
> Regards,
> Gourav
>
> On Wed
rrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
>
> at scala.concurrent.Await$.result(package.scala:107)
>
> at
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
>
> at
> org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
>
> at
> org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.stopExecutors(CoarseGrainedSchedulerBackend.scala:257)
>
>
>
--
Best Regards
Jeff Zhang
mbly jar? Or should we create
> & upload it to Maven central?
>
>
>
> Thanks!
>
>
>
> Xiaoyong
>
>
>
--
Best Regards
Jeff Zhang
It is inconsistent with scala api which is error by default. Any reason for
that ? Thanks
--
Best Regards
Jeff Zhang
iginal PR [1]) but the Python API seems to have been
> changed to match Scala / Java in
> https://issues.apache.org/jira/browse/SPARK-6366
>
> Feel free to open a JIRA / PR for this.
>
> Thanks
> Shivaram
>
> [1] https://github.com/amplab-extras/SparkR-pkg/pull/199/files
>
>
ld post to support
> troubleshooting? Is this JIRA-worthy? Thanks
>
> Antonio
>
>
>
>
--
Best Regards
Jeff Zhang
the partition lookups?
>
> Currently it takes around 1.5 hours for me just to cache in the partition
> information and after that I can see that the job gets queued in the SPARK
> UI.
>
> Regards,
> Gourav
>
--
Best Regards
Jeff Zhang
}
> ]
> }
>
> ___
> Exception in thread "main" scala.MatchError: StringType (of class
> org.apache.spark.sql.types.StringType$)
> at
> org.apache.spark.sql.json.InferSchema$.apply(InferSchema.scala:58)
> at
>
> org.apache.spark.sql.json.JSONRelation$$anonfun$schema$1.apply(JSONRelation.scala:139)
>
> ___
> why
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/sql-Exception-in-thread-main-scala-MatchError-StringType-tp25868.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
. Is there anyway to do that ? Or do I miss anything here ?
--
Best Regards
Jeff Zhang
ubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
actually be SchedulingMode.FIFO
> if you haven't changed the code:
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L65
>
> On Tue, Jan 5, 2016 at 5:29 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Right,
es.
>
>
> I have 5 node Spark cluster each with 30 GB memory. i am want to process
> hive table with 450GB data using DataFrames. To fetch single row from Hive
> table its taking 36 mins. Pls suggest me what wrong here and any help is
> appreciated.
>
>
> Thanks
> Bala
>
>
>
--
Best Regards
Jeff Zhang
park/blob/master/core/src/main/scala/org/apache/spark/scheduler/SchedulableBuilder.scala#L90
> ).
>
>
> On Tue, Jan 5, 2016 at 4:15 PM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> Sorry, I don't make it clearly. What I want is the default pool is fair
>>
월 5일 (화) 오후 2:27, Julio Antonio Soto de Vicente <
>>>>>>>>>>> ju...@esbet.es>님이 작성:
>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately, Koert is right.
>>>>>>>>>>>>
>>>>>>>>>>>> I've been in a couple of projects using Spark (banking
>>>>>>>>>>>> industry) where CentOS + Python 2.6 is the toolbox available.
>>>>>>>>>>>>
>>>>>>>>>>>> That said, I believe it should not be a concern for Spark.
>>>>>>>>>>>> Python 2.6 is old and busted, which is totally opposite to the
>>>>>>>>>>>> Spark
>>>>>>>>>>>> philosophy IMO.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> El 5 ene 2016, a las 20:07, Koert Kuipers <ko...@tresata.com>
>>>>>>>>>>>> escribió:
>>>>>>>>>>>>
>>>>>>>>>>>> rhel/centos 6 ships with python 2.6, doesnt it?
>>>>>>>>>>>>
>>>>>>>>>>>> if so, i still know plenty of large companies where python 2.6
>>>>>>>>>>>> is the only option. asking them for python 2.7 is not going to work
>>>>>>>>>>>>
>>>>>>>>>>>> so i think its a bad idea
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 5, 2016 at 1:52 PM, Juliet Hougland <
>>>>>>>>>>>> juliet.hougl...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see a reason Spark 2.0 would need to support Python
>>>>>>>>>>>>> 2.6. At this point, Python 3 should be the default that is
>>>>>>>>>>>>> encouraged.
>>>>>>>>>>>>> Most organizations acknowledge the 2.7 is common, but lagging
>>>>>>>>>>>>> behind the version they should theoretically use. Dropping python
>>>>>>>>>>>>> 2.6
>>>>>>>>>>>>> support sounds very reasonable to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 5:45 AM, Nicholas Chammas <
>>>>>>>>>>>>> nicholas.cham...@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Red Hat supports Python 2.6 on REHL 5 until 2020
>>>>>>>>>>>>>> <https://alexgaynor.net/2015/mar/30/red-hat-open-source-community/>,
>>>>>>>>>>>>>> but otherwise yes, Python 2.6 is ancient history and the core
>>>>>>>>>>>>>> Python
>>>>>>>>>>>>>> developers stopped supporting it in 2013. REHL 5 is not a good
>>>>>>>>>>>>>> enough
>>>>>>>>>>>>>> reason to continue support for Python 2.6 IMO.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> We should aim to support Python 2.7 and Python 3.3+ (which I
>>>>>>>>>>>>>> believe we currently do).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Nick
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 8:01 AM Allen Zhang <
>>>>>>>>>>>>>> allenzhang...@126.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> plus 1,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> we are currently using python 2.7.2 in production
>>>>>>>>>>>>>>> environment.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> 在 2016-01-05 18:11:45,"Meethu Mathew" <
>>>>>>>>>>>>>>> meethu.mat...@flytxt.com> 写道:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +1
>>>>>>>>>>>>>>> We use Python 2.7
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Meethu Mathew
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Jan 5, 2016 at 12:47 PM, Reynold Xin <
>>>>>>>>>>>>>>> r...@databricks.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Does anybody here care about us dropping support for Python
>>>>>>>>>>>>>>>> 2.6 in Spark 2.0?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Python 2.6 is ancient, and is pretty slow in many aspects
>>>>>>>>>>>>>>>> (e.g. json parsing) when compared with Python 2.7. Some
>>>>>>>>>>>>>>>> libraries that
>>>>>>>>>>>>>>>> Spark depend on stopped supporting 2.6. We can still convince
>>>>>>>>>>>>>>>> the library
>>>>>>>>>>>>>>>> maintainers to support 2.6, but it will be extra work. I'm
>>>>>>>>>>>>>>>> curious if
>>>>>>>>>>>>>>>> anybody still uses Python 2.6 to run Spark.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>>
>>
>
--
Best Regards
Jeff Zhang
're using fair scheduling and don't set a pool,
> the default pool will be used.
>
> On Tue, Jan 5, 2016 at 1:57 AM, Jeff Zhang <zjf...@gmail.com> wrote:
>
>>
>> It seems currently spark.scheduler.pool must be set as localProperties
>> (associate with thr
;hse") return a column not the row data
> What am I missing here?
>
--
Best Regards
Jeff Zhang
fore it gets cleared up.
>
> Would the driver not wait till all the stuff related to test1 is completed
> before calling test2 as test2 is dependent on test1?
>
> val test1 =RDD1.mapPartitions.()
>
> val test2 = test1.mapPartititions()
>
> On Sat, Dec 19, 2015 at 12:24 AM, Jeff Zh
Exception {
>
> logger.info("AEDWIP s:{}", s);
>
> String ret = s.equalsIgnoreCase(category1) ? category1 :
> category3;
>
> return ret;
>
> }
>
> }
>
>
> public class Features implements Serializab
efor can't find the parse method
>
> Any idea on how to solve this depdendency problem?
>
> thanks in advance
>
--
Best Regards
Jeff Zhang
@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
jar and restart all the namenode on yarn ?
>
>
>
> Thanks a lot.
>
>
>
> Mars
>
>
>
--
Best Regards
Jeff Zhang
t;345",""),2)
> z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
> + y)
> res143: String = 10
>
> Scenario2:
> val z = sc.parallelize(List("12","23","","345"),2)
> z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x
> + y)
> res144: String = 11
>
> why the result is different . I was expecting 10 for both. also for the
> first Partition
>
--
Best Regards
Jeff Zhang
che Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
'm launching my applications through YARN. Where will these properties be
> logged?. I guess they wont be part of YARN logs
>
> 2015-12-28 13:22 GMT+01:00 Jeff Zhang <zjf...@gmail.com>:
>
>> set spark.logConf as true in spark-default.conf will log the property in
>> driver sid
he.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
> at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
> Regards,
> Rajesh
>
--
Best Regards
Jeff Zhang
t com.cisco.ss.etl.Main.main(Main.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:672)
>>> at
>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:120)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>> Regards,
>>> Rajesh
>>>
>>
>>
>
--
Best Regards
Jeff Zhang
super(f, dataType, inputTypes);
>
> ??? Why do I have to implement this constructor ???
>
> ??? What are the arguments ???
>
> }
>
>
>
> @Override
>
> public
>
> Column apply(scala.collection.Seq exprs) {
>
> What do you do with a scala seq?
>
> return ???;
>
> }
>
> }
>
> }
>
>
>
--
Best Regards
Jeff Zhang
nsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
g-up-tp25735.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
;
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
>
--
Best Regards
Jeff Zhang
1:24:29,942 INFO [Thread-6] regionserver.ShutdownHook:
> Starting fs shutdown hook thread.
> 2015-12-17 21:24:29,953 INFO [Thread-6] regionserver.ShutdownHook:
> Shutdown hook finished.
>
>
>
--
Best Regards
Jeff Zhang
ee the job still running (in
> yarn resource manager) after final hive insert is complete.
>
> The code flow is
>
> start context
> do somework
> insert to hive
> sc.stop
>
> This is sparkling water job is that matters.
>
> Is there anything else needed ?
>
> Thanks,
>
> J
>
>
>
--
Best Regards
Jeff Zhang
gt; scala> df.withColumn("line2",df2("line"))
>
> org.apache.spark.sql.AnalysisException: resolved attribute(s) line#2330
> missing from line#2326 in operator !Project [line#2326,line#2330 AS
> line2#2331];
>
>
>
> Thanks and Regards,
> Vishnu Viswanath
> *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*
>
--
Best Regards
Jeff Zhang
ta which I need to predict.
>> And
>> > suppose the new dataset has C as the most frequent word, followed by B
>> and
>> > A. So the StringIndexer will assign index as
>> >
>> > C 0.0
>> > B 1.0
>> > A 2.0
>> >
>> > These indexes are different from what we used for modeling. So won’t
>> this
>> > give me a wrong prediction if I use StringIndexer?
>> >
>> > --
>> > Thanks and Regards,
>> > Vishnu Viswanath,
>> > www.vishnuviswanath.com
>>
>
>
>
> --
> Thanks and Regards,
> Vishnu Viswanath,
> *www.vishnuviswanath.com <http://www.vishnuviswanath.com>*
>
--
Best Regards
Jeff Zhang
. Is
this because the interface is still unstable now ?
--
Best Regards
Jeff Zhang
data?
>
> Many thanks in advance for any responses.
>
> Cheers!
>
--
Best Regards
Jeff Zhang
t;
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Optimizing-large-collect-operations-tp25498.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
m performance perspective, what enhancement I can make
> to make it better?
>
> Thanks
>
> --
> --Anfernee
>
--
Best Regards
Jeff Zhang
xception
> at repro.Repro$$anonfun$main$2.apply$mcZI$sp(repro.scala:49)
> at repro.Repro$$anonfun$main$2.apply
the parquet document.
>
> http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
>
> Thanks
>
--
Best Regards
Jeff Zhang
gt;>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-DataFrame-Out-of-memory-exception-for-very-small-file-tp25435.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional
>> commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Regards,
> Vipul Rai
> www.vipulrai.me
> +91-8892598819
> <http://in.linkedin.com/in/vipulrai/>
>
--
Best Regards
Jeff Zhang
gards,
> Vipul
>
> On 23 November 2015 at 14:25, Jeff Zhang <zjf...@gmail.com> wrote:
>
>> >>> Do I need to create a new DataFrame for every update to the
>> DataFrame like
>> addition of new column or need to update the original sales DataFrame
/application_1463194314221_211370/spark-3cc37dc7-fa3c-4b98-aa60-0acdfc79c725/28/shuffle_8553_38_0.index
>> (No such file or directory)
>>
>> any idea about this error ?
>> --
>> Thanks,
>> Kishore.
>>
>
>
>
> --
> Thanks,
> Kishore.
>
--
Best Regards
Jeff Zhang
bscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>
--
Best Regards
Jeff Zhang
gt; at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
> at py4j.Gateway.invoke(Gateway.java:259)
> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
> at py4j.commands.CallCommand.execute(CallCommand.java:79)
> at py4j.GatewayConnection.run(GatewayConnection.java:209)
> at java.lang.Thread.run(Thread.java:745)
>
>
>
--
Best Regards
Jeff Zhang
,
x2*x3, x3*x1, x3*x2,x3*x3)
(3,[0,2],[1.0,1.0]) -->
(9,[0,1,5,6,8],[1.0,1.0,1.0,1.0,1.0])|
--
Best Regards
Jeff Zhang
examples.SparkPi
>> --master yarn-client --driver-memory 512m --num-executors 2
>> --executor-memory 512m --executor-cores 210:
>>
>>
>>
>>- Error: Could not find or load main class
>>org.apache.spark.deploy.yarn.ExecutorLauncher
>>
>> but i don't config that para ,there no error why???that para is only
>> avoid Uploading resource file(jar package)??
>>
>
>
--
Best Regards
Jeff Zhang
thoughts on how to track down
> what is happening here?
>
> Thanks!
>
> Pierre.
>
--
Best Regards
Jeff Zhang
1 - 100 of 194 matches
Mail list logo