[OSS DIGEST] The major changes of Apache Spark from May 20 to June 2

2020-07-07 Thread wuyi
Hi all,

This is the bi-weekly Apache Spark digest from the Databricks OSS team.
For each API/configuration/behavior change, an [API] tag is added in the
title.

Hi all,

This is the bi-weekly Apache Spark digest from the Databricks OSS team.
For each API/configuration/behavior change, an [API] tag is added in the
title.


CORE[API][8.0][SPARK-29150][CORE] Update RDD API for Stage level scheduling
to be public (+29, -25)>This PR makes the access level of the RDD api for
stage level scheduling public.[API][7.1][SPARK-8981][CORE] Add MDC support
in Executor (+27, -1)>This PR added MDC(Mapped Diagnostic Context) support
for task threads. By default, each log line printed by the same task thread
will include the same unique task name. Besides, user can also add the
custom content to logs by configuring the log4j pattern. For example,
application IDs/names. This is important when the clusters is shared by
different users/applications.Before:scala> testDf.collect()...20/04/28
16:41:58 WARN MemoryStore: Failed to reserve initial memory threshold of
1024.0 KB for computing block broadcast_1 in memory.20/04/28 16:41:58 WARN
MemoryStore: Not enough space to cache broadcast_1 in memory! (computed
384.0 B so far)20/04/28 16:41:58 WARN MemoryStore: Failed to reserve initial
memory threshold of 1024.0 KB for computing block broadcast_0 in
memory.20/04/28 16:41:58 WARN MemoryStore: Not enough space to cache
broadcast_0 in memory! (computed 384.0 B so far)20/04/28 16:41:58 WARN
RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not
spill but return 0.20/04/28 16:41:58 WARN RowBasedKeyValueBatch: Calling
spill() on RowBasedKeyValueBatch. Will not spill but return 0.20/04/28
16:41:58 WARN RowBasedKeyValueBatch: Failed to allocate page (1048576
bytes).20/04/28 16:41:58 WARN RowBasedKeyValueBatch: Failed to allocate page
(1048576 bytes).20/04/28 16:41:58 ERROR Executor: Exception in task 1.0 in
stage 0.0 (TID 1)org.apache.spark.memory.SparkOutOfMemoryError: Unable to
acquire 262144 bytes of memory, got 22200...After(please note the end of
each line):scala> testDf.collect()...20/04/28 16:40:59 WARN MemoryStore:
Failed to reserve initial memory threshold of 1024.0 KB for computing block
broadcast_1 in memory [task 1.0 in stage 0.0].20/04/28 16:40:59 WARN
MemoryStore: Not enough space to cache broadcast_1 in memory! (computed
384.0 B so far) [task 1.0 in stage 0.0]20/04/28 16:40:59 WARN MemoryStore:
Failed to reserve initial memory threshold of 1024.0 KB for computing block
broadcast_0 in memory. [task 1.0 in stage 0.0]20/04/28 16:40:59 WARN
MemoryStore: Not enough space to cache broadcast_0 in memory! (computed
384.0 B so far) [task 1.0 in stage 0.0] 20/04/28 16:40:59 WARN
RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not
spill but return 0. [task 0.0 in stage 0.0]20/04/28 16:40:59 WARN
RowBasedKeyValueBatch: Calling spill() on RowBasedKeyValueBatch. Will not
spill but return 0. [task 1.0 in stage 0.0]20/04/28 16:40:59 WARN
RowBasedKeyValueBatch: Failed to allocate page (1048576 bytes). [task 0.0 in
stage 0.0]20/04/28 16:40:59 WARN RowBasedKeyValueBatch: Failed to allocate
page (1048576 bytes). [task 1.0 in stage 0.0] 20/04/28 16:41:00 ERROR
Executor: Exception in task 0.0 in stage 0.0 (TID
0)org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 262144
bytes of memory, got 22200 [task 0.0 in stage
0.0]...SQL[API][7.0][SPARK-31750][SQL] Eliminate UpCast if child's dataType
is DecimalType (+52, -8)>Eliminate the UpCast that are implicitly added by
Spark, if its child data type is already the decimal type. Otherwise, for
cases like:sql("select cast(1 as decimal(38, 0)) as d")
.write.mode("overwrite")
.parquet(f.getAbsolutePath)spark.read.parquet(f.getAbsolutePath).as[BigDecimal]could
fail as follow:[info] org.apache.spark.sql.AnalysisException: Cannot up cast
`d` from decimal(38,0) to decimal(38,18).[info] The type path of the target
object is:[info] - root class: "scala.math.BigDecimal"[info] You can either
add an explicit cast to the input data or choose a higher precision type of
the field in the target object;[info] at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3060)[info]
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$33$$anonfun$applyOrElse$174.applyOrElse(Analyzer.scala:3087)[info]
at
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$33$$anonfun$applyOrElse$174.applyOrElse(Analyzer.scala:3071)[info]
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309)[info]
at
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72)[info]
at
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309)[info]
at
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314)[API][7.0][SPARK-31755][SQL]
Allow missing year/ho

Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-05 Thread wuyi
Ok, after having another look, I think it only affects local cluster deploy
mode, which is for testing only. 


wuyi wrote
> Please also includes https://issues.apache.org/jira/browse/SPARK-32120 in
> Spark 3.0.1. It's a regression compares to Spark 3.0.0-preview2.
> 
> Thanks,
> Yi Wu
> 
> 
> Yuanjian Li wrote
>> Hi dev-list,
>> 
>> I’m writing this to raise the discussion about Spark 3.0.1 feasibility
>> since 4 blocker issues were found after Spark 3.0.0:
>> 
>> 
>>1.
>> 
>>[SPARK-31990]
>> <https://issues.apache.org/jira/browse/SPARK-31990>;
>> The
>>state store compatibility broken will cause a correctness issue when
>>Streaming query with `dropDuplicate` uses the checkpoint written by
>> the
>> old
>>Spark version.
>>2.
>> 
>>[SPARK-32038]
>> <https://issues.apache.org/jira/browse/SPARK-32038>;
>> The
>>regression bug in handling NaN values in COUNT(DISTINCT)
>>3.
>> 
>>[SPARK-31918]
>> <https://issues.apache.org/jira/browse/SPARK-31918>[WIP]
>>CRAN requires to make it working with the latest R 4.0. It makes the
>> 3.0
>>release unavailable on CRAN, and only supports R [3.5, 4.0)
>>4.
>> 
>>[SPARK-31967]
>> <https://issues.apache.org/jira/browse/SPARK-31967>;
>>Downgrade vis.js to fix Jobs UI loading time regression
>> 
>> 
>> I also noticed branch-3.0 already has 39 commits
>> <https://issues.apache.org/jira/browse/SPARK-32038?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%203.0.1>;
>> after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to
>> deliver the critical fixes.
>> 
>> Any comments are appreciated.
>> 
>> Best,
>> 
>> Yuanjian
> 
> 
> 
> 
> 
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
> 
> -
> To unsubscribe e-mail: 

> dev-unsubscribe@.apache





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Apache Spark 3.0.1 Release

2020-07-05 Thread wuyi
Please also includes https://issues.apache.org/jira/browse/SPARK-32120 in
Spark 3.0.1. It's a regression compares to Spark 3.0.0-preview2.

Thanks,
Yi Wu


Yuanjian Li wrote
> Hi dev-list,
> 
> I’m writing this to raise the discussion about Spark 3.0.1 feasibility
> since 4 blocker issues were found after Spark 3.0.0:
> 
> 
>1.
> 
>[SPARK-31990] ;
> The
>state store compatibility broken will cause a correctness issue when
>Streaming query with `dropDuplicate` uses the checkpoint written by the
> old
>Spark version.
>2.
> 
>[SPARK-32038] ;
> The
>regression bug in handling NaN values in COUNT(DISTINCT)
>3.
> 
>[SPARK-31918]
> [WIP]
>CRAN requires to make it working with the latest R 4.0. It makes the
> 3.0
>release unavailable on CRAN, and only supports R [3.5, 4.0)
>4.
> 
>[SPARK-31967] ;
>Downgrade vis.js to fix Jobs UI loading time regression
> 
> 
> I also noticed branch-3.0 already has 39 commits
> ;
> after Spark 3.0.0. I think it would be great if we have Spark 3.0.1 to
> deliver the critical fixes.
> 
> Any comments are appreciated.
> 
> Best,
> 
> Yuanjian





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Decommissioning SPIP

2020-07-02 Thread wuyi
+1 for having this feature in Spark



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread wuyi
This could be a sub-task of 
https://issues.apache.org/jira/browse/SPARK-25299
  (Use remote storage for
persisting shuffle data)? 

It's good if we could put the whole SPARK-25299 in Spark 3.1.



Holden Karau wrote
> Should we also consider the shuffle service refactoring to support
> pluggable storage engines as targeting the 3.1 release?
> 
> On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk <

> maxim.gekk@

> >
> wrote:
> 
>> Hi Dongjoon,
>>
>> I would add:
>> - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366)
>> - Filters pushdown to other datasources like Avro
>> - Support nested attributes of filters pushed down to JSON
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>>
>> On Mon, Jun 29, 2020 at 7:07 PM Dongjoon Hyun <

> dongjoon.hyun@

> >
>> wrote:
>>
>>> Hi, All.
>>>
>>> After a short celebration of Apache Spark 3.0, I'd like to ask you the
>>> community opinion on Apache Spark 3.1 feature expectations.
>>>
>>> First of all, Apache Spark 3.1 is scheduled for December 2020.
>>> - https://spark.apache.org/versioning-policy.html
>>>
>>> I'm expecting the following items:
>>>
>>> 1. Support Scala 2.13
>>> 2. Use Apache Hadoop 3.2 by default for better cloud support
>>> 3. Declaring Kubernetes Scheduler GA
>>> In my perspective, the last main missing piece was Dynamic
>>> allocation
>>> and
>>> - Dynamic allocation with shuffle tracking is already shipped at
>>> 3.0.
>>> - Dynamic allocation with worker decommission/data migration is
>>> targeting 3.1. (Thanks, Holden)
>>> 4. DSv2 Stabilization
>>>
>>> I'm aware of some more features which are on the way currently, but I
>>> love to hear the opinions from the main developers and more over the
>>> main
>>> users who need those features.
>>>
>>> Thank you in advance. Welcome for any comments.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>
> 
> -- 
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  ;
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread wuyi
I've left the comments in SPIP, so let's discuss there.


Holden Karau wrote
> So from the template I believe the SPIP is supposed to be more high level
> and then design goes into the linked “design sketch.” What sort of detail
> would you like to see added?
> 
> On Mon, Jun 29, 2020 at 1:38 AM wuyi <

> yi.wu@

> > wrote:
> 
>> Thank you for your effort, Holden.
>>
>> I left a few comments in SPIP. I asked for some details, though I know
>> some
>> contents have been include in the design doc. I'm not very clear about
>> difference between the design doc and SPIP. But from what I saw at the
>> SPIP
>> template questions, I think some details maybe still needed.
>>
>>
>> --
>> Yi
>>
>>
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: 

> dev-unsubscribe@.apache

>>
>> --
> Twitter: https://twitter.com/holdenkarau
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>;
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: UnknownSource NullPointerException in CodeGen. with Custom Strategy

2020-06-29 Thread wuyi
Hi Nasrulla,

Could you give a complete demo to reproduce the issue?



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread wuyi
Thank you for your effort, Holden.

I left a few comments in SPIP. I asked for some details, though I know some
contents have been include in the design doc. I'm not very clear about
difference between the design doc and SPIP. But from what I saw at the SPIP
template questions, I think some details maybe still needed.


--
Yi





--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [ANNOUNCE] Apache Spark 3.0.0

2020-06-18 Thread wuyi
Congrats!!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Getting the ball started on a 2.4.6 release

2020-04-22 Thread wuyi
We have a conclusion now and we decide to include SPARK-31509 in the PR of
SPARK-31485. So there actually should be only one candidate(But to be
honest, it still depends on committers).

Best,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Getting the ball started on a 2.4.6 release

2020-04-21 Thread wuyi
I have one: https://issues.apache.org/jira/browse/SPARK-31485, which could
cause application hang.
 

And, probably, also https://issues.apache.org/jira/browse/SPARK-31509, to
make better guidance of barrier execution for user. But we do not have
conclusion yet.

Best,

Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: BlockManager and ShuffleManager = can getLocalBytes be ever used for shuffle blocks?

2020-04-18 Thread wuyi
Hi Jacek,

code in link [2] is the out of date. The commit
https://github.com/apache/spark/commit/32ec528e63cb768f85644282978040157c3c2fb7
has already removed unreachable branch.


Best,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: spark lacks fault tolerance with dynamic partition overwrite

2020-04-07 Thread wuyi
Hi,  Koert,

The community is back to this issue to recently and there's already a fix 
https://github.com/apache/spark/pull/26339 for it.

You can track and review it there.

Best,

Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-16 Thread wuyi
Thanks Sean and Takeshi.

Option 1 seems really impossible. And I'm going to take Option 2 as an
alternative choice.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-14 Thread wuyi
Hi Takeshi, thanks for your reply.

Before the broken, we only do the null check for primitive types and leave
null value of non-primitive type to UDF itself in case it will be handled
specifically, e.g., a UDF may return something else for null String. 



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSS] Null-handling of primitive-type of untyped Scala UDF in Scala 2.12

2020-03-13 Thread wuyi
Hi all, I'd like to raise a discussion here about null-handling of
primitive-type of untyped Scala UDF [ udf(f: AnyRef, dataType: DataType) ].

After we switch to Scala 2.12 in 3.0, the untyped Scala UDF is broken
because now we can't use reflection to get the parameter types of the Scala
lambda.
This leads to silent result changing, for example, with UDF defined as `val
f = udf((x: Int) => x, IntegerType)`, the query `select f($"x")` has
different
behavior between 2.4 and 3.0 when the input value of column x is null.

Spark 2.4:  null
Spark 3.0:  0

Because of it, we deprecate the untyped Scala UDF in 3.0 and recommend users
to use the typed ones. However, recently I identified several valid use
cases,
e.g., `val f = (r: Row) => Row(r.getAs[Int](0) * 2)`, where the schema
cannot be detected in typed Scala UDF [ udf[RT: TypeTag, A1: TypeTag](f:
Function1[A1, RT]) ].

There are 3 solutions:
1. find a way to get Scala lambda parameter types by reflection (I tried it
very hard but has no luck. The Java SAM type is so dynamic)
2. support case class as the input of typed Scala UDF, so at least people
can still deal with struct type input column with UDF
3. add a new variant of untyped Scala UDF which users can specify input
types

I'd like to see more feedbacks or ideas about how to move forward.

Thanks,
Yi Wu



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSS] Introduce WorkerOffer reservation mechanism for barrier scheduling

2019-03-13 Thread wuyi
Currently, Barrier TaskSet has a hard requirement that tasks can only be
launched in a single resourceOffers round with enough slots(or say,
sufficient resources), but can not be guaranteed even if with enough slots
due to task locality delay scheduling(also see discussion 
https://github.com/apache/spark/pull/21758#discussion_r204917245
  ). So,
it is very likely that Barrier TaskSet gets a chunk of sufficient resources
after all the trouble, but let it go easily just because one of pending
tasks can not be scheduled. And yet, it is hard to control all tasks
launching at the same time, which will bring complexity for fault tolerance.
Futhermore, it causes severe resource competition between TaskSets and jobs
and introduce unclear semantic for DynamicAllocation(see discussion 
https://github.com/apache/spark/pull/21758#discussion_r204917880
  ).

So, here, I want to introduce a new mechanism, called /WorkerOffer
Reservation Mechanism for barrier scheduling. With /WorkerOffer Reservation
Mechanism, a barrier taskset could reserve WorkerOffer in multi
resourceOffers() round, and launch tasks at the same time once it accumulate
the sufficient resource. 

The whole process may looks like:

* [1] CoarseGrainedSchedulerBackend call
TaskScheduler#resourceOffers(offers)

* [2] in resourceOffers(), we firstly exclude reserved cpus by barrier
tasksets in previous resourceOffers() round

* [3] if a task(CPU_PER_TASK = 2) from barrier taskset could launch on
WorkerOffer(hostA, cores=5), lets say
WO1, then, we reserve 2 cpus from WO1 for this task. So, in next
resourceOffers() round, 2 cpus would
be exclude from WO1. In another word, in next resourceOffers()
round, WO1 would has 8 cpus to offer. 
And we'll regard this task as a ready task. 

* [4] After one or multiple resourceOffers() round, when the barrier
taskset's ready tasks' num reaches  
taskSets' numTasks, we could launch all of the ready tasks at the
same time. 

Besides, we have two features along with /WorkerOffer Reservation
Mechanism/:

* To avoid the deadlock which may be introuduced by serveral Barrier
TaskSets holding the reserved WorkerOffers for a long time, we'll ask
Barrier TaskSets to force releasing part of reserved WorkerOffers
on demand. So, it is highly possible that each Barrier TaskSet would be
launched in the end.

* Barrier TaskSet could replace old high level locality reserved WorkerOffer
with new low level locality WorkerOffer during the time it wating for
sufficient resources, to perform better locality at the end.

And there's possibility for /WorkerOffer Reservation Mechanism to work with
ExecuorAllocationManager(aka DynamicAllocation):

When cpus in WorkerOffer are reserved, we send a new event, called
ExecutorReservedEvent, to EAM, which indicates the corresponding Executor's
resource is being reserved. EAM receives that event should not regard the
executor is idle and remove it later, instead, it keeps the executor(maybe,
for a confined time) as it knows someone may use it later. Similarly, we
send a ExecutorReleasedEvent when reserved cpus are released.

/WorkerOffer Reservation Mechanism will not impact non-barrier taskset, it
remains the same behavior for non-barrier taskset.

To summary: 

* /WorkerOffer Reservation Mechanism relax the requirement for resources,
since barrier taskset could be launched after multiple resourceOffer()
round;

* barrier tasks are guaranteed to be launched at the same time;

* it provides an possibility to work with ExecuorAllocationManager;


Actually, I've already filed a JIRA SPARK-26439 and pr #24010 for this(but
less attention), any one interest on this could see it from the code
directly.

So, any one has any thoughts on this ? (personally, I think it would really
do good for barrier scheduling)



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Cannot run program ".../jre/bin/javac": error=2, No such file or directory

2018-12-01 Thread wuyi
Hi, Owen, thank for your suggestion.

   I recheck my env and do not find any wrong with JAVA_HOME. But I'm agree
with you there must be something wrong with the system env.

  Currently, I created a link file (named javac) under $JAVA_HOME/jre/bin to
link to $JAVA_HOME/bin/javac to work around this problem. And it works fine
for me.

  



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Cannot run program ".../jre/bin/javac": error=2, No such file or directory

2018-12-01 Thread wuyi
Dear all, 

 I use cmd "*./build/mvn -Pyarn -Phadoop-2.7 -Dhadoop.version=2.7.6
-DskipTests clean package*" to
compile Spark2.4, but failed on Spark Project Tags, which throws error:

*Cannot run program
"/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/bin/javac":
error=2, No such file or directory*

And I double check my
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home,
and
cmd "java -version" can be executed successfully, also for "javac -help".


Here's some compile info:

INFO] Using zinc server for incremental compilation
[INFO] Toolchain in scala-maven-plugin:
*/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre*
[warn] Pruning sources from previous analysis, due to incompatible
CompileSetup.
[info] Compiling 2 Scala sources and 8 Java sources to
/Users/wuyi/workspace/spark/common/tags/target/scala-2.12/classes...
[error] Cannot run program
"/Library/Java/JavaVirtualMachines/jdk1.8.0_131.jdk/Contents/Home/jre/bin/javac":
error=2, No such file or directory


I guess maven should find *javac* under directory *$JAVA_HOME/bin*, but why
it goes to *$JAVA_HOME/jre/bin * ?

I'd appreciate a lot if any devs could give me a hint. THANKS. 

Best wishes. 
wuyi




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Hi, cloud0fan,

I tried it and that's really good and cool! Thanks again!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Hi, cloud0fan.
Yeah, tests run well in SBT. 
Maybe, I should try your way. Thanks!



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



ClassNotFoundException while running unit test with local cluster mode in Intellij IDEA

2018-01-30 Thread wuyi
Dear devs,
I'v got stuck on this issue for several days, and I need help now.
At the first, I run into an old issue, which is the same as 
http://apache-spark-developers-list.1001551.n3.nabble.com/test-cases-stuck-on-quot-local-cluster-mode-quot-of-ReplSuite-td3086.html
<http://apache-spark-developers-list.1001551.n3.nabble.com/test-cases-stuck-on-quot-local-cluster-mode-quot-of-ReplSuite-td3086.html>
  
  
   So, I check my assembly jar, and add the assembly jar to dependencies of
core project
(I run unit test within this sub project), and I set the SPARK_HOME(even
though, I do not 
have a wrong SPARK_HOME before). 

After that, unit test with local cluster mode will not be blocking all the
time, but throws a *ClassNotFoundException*, such as:

Job aborted due to stage failure: Task 1 in stage 0.0 failed 4 times, most
recent failure: 
Lost task 1.3 in stage 0.0 (TID 5, localhost, executor 3):
java.lang.ClassNotFoundException:
org.apache.spark.broadcast.BroadcastSuite$$anonfun$15$$anonfun$16
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
...
Driver stacktrace:
org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task 1.3 in
stage 0.0 (TID 5, localhost, executor 3): java.lang.ClassNotFoundException:
org.apache.spark.broadcast.BroadcastSuite$$anonfun$15$$anonfun$16
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
...

And then, I tried rebuilt the whole spark project or core project,
build/test the core project, 
add the 'spark.driver.extraClassPath/spark.executor.extraClassPath' param
and so on but, all failed. 

Maybe I miss something when I try to run unit test with *local cluster* in
*Intellij IDEA*. 
I'd appreciate a lot if any guys could give me a hint. THANKS.

Best wishes.
wuyi



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org