I’d vote my +1 first.
On 2021/11/13 02:25:05 "L. C. Hsieh" wrote:
> Hi all,
>
> I’d like to start a vote for SPIP: Row-level operations in Data Source V2.
>
> The proposal is to add support for executing row-level operations
> such as DELETE, UPDATE, MERGE for v2 tables (SPARK-35801). The
>
Hi all,
The vote passed with the following 9 +1 votes and no -1 or +0 votes:
Liang-Chi Hsieh*
Russell Spitzer
Dongjoon Hyun*
Huaxin Gao
Ryan Blue
DB Tsai*
Holden Karau*
Cheng Su
Wenchen Fan*
* = binding
Thank you guys all for your feedback and votes
+1. Docs looks good. Binary looks good.
Ran simple test and some tpcds queries.
Thanks for working on this!
wuyi wrote
> Please vote on releasing the following candidate as Apache Spark version
> 3.0.3.
>
> The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC
> votes are
ing.
>
> https://github.com/apache/spark/graphs/commit-activity
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jun 16, 2021 at 9:19 AM Liang-Chi Hsieh
> viirya@
> wrote:
>
>> First, thanks for being volunteer as the release manager of Spark 3.2.0,
>&
https://issues.apache.org/jira/browse/SPARK-10816;
>- Add RocksDB StateStore as external module
>https://issues.apache.org/jira/browse/SPARK-34198;
>
>
> I wonder whether we should postpone the branch cut date.
> cc Min Shen, Yi Wu, Max Gekk, Huaxin Gao, Jungtaek Lim, Yuan
+1. Thank you!
Liang-Chi
Dongjoon Hyun-2 wrote
> +1, Thank you! :)
>
> Bests,
> Dongjoon.
>
> On Tue, Jun 8, 2021 at 9:05 PM Kent Yao
> yaooqinn@
> wrote:
>
>> +1. Thanks, Yi ~
>>
>> Bests,
>> *Kent Yao *
>> @ Data Science Center, Hangzhou Research Institute, NetEase Corp.
>> *a spark
Thank you, Dongjoon!
Takeshi Yamamuro wrote
> Thank you, Dongjoon!
>
> On Wed, Jun 2, 2021 at 2:29 PM Xiao Li
> lixiao@
> wrote:
>
>> Thank you!
>>
>> Xiao
>>
>> On Tue, Jun 1, 2021 at 9:29 PM Hyukjin Kwon
> gurwls223@
> wrote:
>>
>>> awesome!
>>>
>>> 2021년 6월 2일 (수) 오전 9:59, Dongjoon
+1 (non-binding)
Binary and doc looks good. JIRA tickets looks good. Ran simple tasks.
Thank you, Dongjoon!
Hyukjin Kwon wrote
> +1
>
> 2021년 5월 26일 (수) 오전 9:00, Cheng Su
> chengsu@.com
> 님이 작성:
>
>> +1 (non-binding)
>>
>>
>>
>> Checked the related commits in commit history manually.
>>
+1
Thanks Takeshi!
Prashant Sharma wrote
> +1
>
> On Thu, May 20, 2021 at 7:08 PM Wenchen Fan
> cloud0fan@
> wrote:
>
>> +1
>>
>> On Thu, May 20, 2021 at 11:59 AM Dongjoon Hyun
> dongjoon.hyun@
>
>> wrote:
>>
>>> +1.
>>>
>>> Thank you, Takeshi.
>>>
>>> On Wed, May 19, 2021 at 7:49 PM
We are happy to announce the availability of Spark 2.4.8!
Spark 2.4.8 is a maintenance release containing stability, correctness, and
security fixes.
This release is based on the branch-2.4 maintenance branch of Spark. We
strongly recommend all 2.4 users to upgrade to this stable release.
To
+1 sounds good. Thanks Dongjoon for volunteering on this!
Liang-Chi
Dongjoon Hyun-2 wrote
> Hi, All.
>
> Since Apache Spark 3.1.1 tag creation (Feb 21),
> new 172 patches including 9 correctness patches and 4 K8s patches arrived
> at branch-3.1.
>
> Shall we make a new release, Apache Spark
The vote passes. Thanks to all who helped with the release!
(* = binding)
+1:
- Dongjoon Hyun *
- Takeshi Yamamuro
- Maxim Gekk
- John Zhuge
- Hyukjin Kwon *
- Kent Yao
- Sean Owen *
- Kousuke Saruta
- Holden Karau *
- Wenchan Fan *
- Mridul Muralidharan *
- Ismaël Mejía
+0: None
-1: None
The staging repository for this release can be accessed now too:
https://repository.apache.org/content/repositories/orgapachespark-1383/
Thanks for the guidance.
Liang-Chi Hsieh wrote
> Seems it is closed now after clicking close button in the UI.
--
Sent from: http://apache-sp
Seems it is closed now after clicking close button in the UI.
Sean Owen-2 wrote
> Is there a separate process that pushes to maven central? That's what we
> have to have in the end.
>
> On Tue, May 11, 2021, 12:31 PM Liang-Chi Hsieh
> viirya@
> wrote:
>
>> I d
Oh, I see. We cannot do release on it as it is still open status.
Okay, let me try to close it manually via UI.
Sean Owen-2 wrote
> Is there a separate process that pushes to maven central? That's what we
> have to have in the end.
>
> On Tue, May 11, 2021, 12:31 PM Liang-Chi Hsie
I don't know what will happens if I manually close it now.
Not sure if the current status cause a problem? If not, maybe leave as it
is?
Sean Owen-2 wrote
> Hm, yes I see it at
> http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee=on=index
> but not on keyserver.ubuntu.com for
I did upload my public key in
https://dist.apache.org/repos/dist/dev/spark/KEYS.
I also uploaded it to public keyserver before cutting RC1.
I just also try to search the public key and can find it.
cloud0fan wrote
> [image: image.png]
>
> I checked the log in
Yea, I don't know why it happens.
I remember RC1 also has the same issue. But RC2 and RC3 don't.
Does it affect the RC?
John Zhuge wrote
> Got this error when browsing the staging repository:
>
> 404 - Repository "orgapachespark-1383 (staging: open)"
> [id=orgapachespark-1383] exists but is
Please vote on releasing the following candidate as Apache Spark version
2.4.8.
The vote is open until May 14th at 9AM PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.4.8
[ ] -1 Do not release this package because
>
>
> United States
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> From: Liang-Chi Hsieh
> viirya@
>
> To:
> dev@.apache
> Date: 04/30/2021 03:12 PM
> Subject:
Hi all,
Thanks for actively voting. Unfortunately, we found a very ancient bug
(SPARK-35278), and the fix (https://github.com/apache/spark/pull/32404) is
going to be merged soon. We may fail this RC3.
I will go to cut RC4 as soon as the fix is merged.
Thank you!
--
Sent from:
I am fine with RocksDB state store as built-in state store. Actually the
proposal to have it as external module is to avoid the raised concern in the
previous effort.
The need to have it as experimental doesn't necessarily mean to have it as
external module, I think. They are two things. So I
Please vote on releasing the following candidate as Apache Spark version
2.4.8.
The vote is open until May 4th at 9AM PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.4.8
[ ] -1 Do not release this package because ...
Thanks all for voting. Unfortunately, we found a long-standing correctness
bug SPARK-35080 and 2.4 was affected too. That is said we need to drop RC2
in favor of RC3.
The fix is ready for merging at https://github.com/apache/spark/pull/32179.
--
Sent from:
Please vote on releasing the following candidate as Apache Spark version
2.4.8.
The vote is open until Apr 15th at 9AM PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.4.8
[ ] -1 Do not release this package because
I'm working on the fix for master. I think the fix is the same for 2.4.
Okay. So I think we are in favor of RC2 and RC1 is dropped. Then I will make
the fix merged first and then prepare RC2.
Thank you.
Liang-Chi
Mridul Muralidharan wrote
> Do we have a fix for this in 3.x/master which can
Thanks for voting.
After I started running the release script to cut RC1 for a while, I found a
nested column pruning bug SPARK-34963, and unfortunately it exists in 2.4.7
too. As RC1 is cut, so I continue this voting.
The bug looks corner case to me and it is not reported yet since we support
Please vote on releasing the following candidate as Apache Spark version
2.4.8.
The vote is open until Apr 10th at 9AM PST and passes if a majority +1 PMC
votes are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 2.4.8
[ ] -1 Do not release this package because
Hi Mingjia,
Thanks for fixing it. I can see it is included.
Liang-Chi
mingjia-2 wrote
> Hi, All.
>
> I fixed SPARK-32708
> https://issues.apache.org/jira/browse/SPARK-32708#;
> a while ago after 2.4.7 release.
> PR:https://github.com/apache/spark/pull/29564
>
> Since it's not listed as
Thanks Hyukjin and Dongjoon! :)
Then I will start RC.
Dongjoon Hyun-2 wrote
> Given that Maven passed already with that profile and you tested locally,
> I'm +1 for staring RC.
>
> Thanks,
> Dongjoon.
>
> On Sun, Apr 4, 2021 at 2:24 AM Hyukjin Kwon
> gurwls223@
> wrote:
>
>> I would
Hi devs,
Currently no open issues or ongoing issues targeting 2.4.
On QA test dashboard, only spark-branch-2.4-test-sbt-hadoop-2.6 is in red
status. The failed test is
org.apache.spark.sql.streaming.StreamingQueryManagerSuite.awaitAnyTermination
with timeout and resetTerminated. It looks a flaky
Congrats! Welcome!
Matei Zaharia wrote
> Hi all,
>
> The Spark PMC recently voted to add several new committers. Please join me
> in welcoming them to their new role! Our new committers are:
>
> - Maciej Szymkiewicz (contributor to PySpark)
> - Max Gekk (contributor to Spark SQL)
> - Kent Yao
To update current status.
The only one remaining issue for 2.4 is:
[SPARK-34855][CORE]spark context - avoid using local lazy val for callSite
We are waiting the author to submit a PR for 2.4 branch.
Liang-Chi Hsieh wrote
> Thank you so much, Takeshi!
>
>
> Takeshi Yamamuro
+1 (non-binding)
rxin wrote
> +1. Would open up a huge persona for Spark.
>
> On Fri, Mar 26 2021 at 11:30 AM, Bryan Cutler <
> cutlerb@
> > wrote:
>
>>
>> +1 (non-binding)
>>
>>
>> On Fri, Mar 26, 2021 at 9:49 AM Maciej <
> mszymkiewicz@
> > wrote:
>>
>>
>>> +1 (nonbinding)
Thank you so much, Takeshi!
Takeshi Yamamuro wrote
> Hi, viirya
>
> I'm looking now into "SPARK-34607: Add `Utils.isMemberClass` to fix a
> malformed class name error
> on jdk8u" .
>
> Bests,
> Takeshi
>
>
> Takeshi Yamamuro
--
Sent from:
To update with current status.
There are three tickets targeting 2.4 that are still ongoing.
SPARK-34719: Correctly resolve the view query with duplicated column names
SPARK-34607: Add `Utils.isMemberClass` to fix a malformed class name error
on jdk8u
SPARK-34726: Fix collectToPython timeouts
>From Python developer perspective, this direction sounds making sense to me.
As pandas is almost the standard library in the related area, if PySpark
supports pandas API out of box, the usability would be in a higher level.
For maintenance cost, IIUC, there are some Spark committers in the
Thanks Shane for looking at it!
shane knapp ☠ wrote
> ...and just like that, overnight the builds started successfully git
> fetching!
>
> --
> Shane Knapp
> Computer Guy / Voice of Reason
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
--
Sent
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
I just contacted Shane and seems there is ongoing github fetches timing out
issue on Jenkins.
That being said, currently the QA test is unavailable. I guess it is unsafe
to make a release cut due to lack of reliable QA test result.
I may defer the cut until QA test comes back if no objection.
+1 (non-binding).
Thanks for the work!
Erik Krogen wrote
> +1 from me (non-binding)
>
> On Tue, Mar 9, 2021 at 9:27 AM huaxin gao
> huaxin.gao11@
> wrote:
>
>> +1 (non-binding)
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
Hi devs,
I'm going to cut the branch yesterday. I'd like to share current progress. I
hit a problem during dry run of the release script. Fixed it in SPARK-34672.
The latest dry run looks okay as build, docs, publish all success. But the
last step (push the tag) has a fatal error, I'm not sure
Thank you Dongjoon.
I'm going to cut the branch now. Hopefully I can make it soon (need to get
familiar with the process as first time :) )
Liang-Chi
Dongjoon Hyun-2 wrote
> Thank you, Liang-Chi! Next Monday sounds good.
>
> To All. Please ping Liang-Chi if you have a missed backport.
>
>
Thanks all for the input.
If there is no objection, I am going to cut the branch next Monday.
Thanks.
Liang-Chi
Takeshi Yamamuro wrote
> +1 for releasing 2.4.8 and thanks, Liang-chi, for volunteering.
> Btw, anyone roughly know how many v2.4 users still are based on some stats
> (e.g., # of
Yeah, in short this is a great compromise approach and I do like to see this
proposal move forward to next step. This discussion is valuable.
Chao Sun wrote
> +1 on Dongjoon's proposal. Great to see this is getting moved forward and
> thanks everyone for the insightful discussion!
>
>
>
>
Thanks Dongjoon!
+1 and I volunteer to do the release of 2.4.8 if it passes.
Liang-Chi
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
for the inputs and discussion.
Liang-Chi Hsieh
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Basically I think the proposal makes sense to me and I'd like to support the
SPIP as it looks like we have strong need for the important feature.
Thanks Ryan for working on this and I do also look forward to Wenchen's
implementation. Thanks for the discussion too.
Actually I think the
Thank you for the inputs! Yikun. Let's take these inputs when we are ready to
have rocksdb state store in Spark SS.
Yikun Jiang wrote
> I worked on some work about rocksdb multi-arch support and version upgrade
> on
> Kafka/Storm/Flink[1][2][3].To avoid these issues happened in spark again,
> I
RocksDB StateStore into sql core module
2. not okay for 1, but okay to add RocksDB StateStore as external module
3. either 1 or 2 is okay
4. not okay to add RocksDB StateStore, no matter into sql core or as
external module
Please let us know if you have some thoughts.
Thank you.
Liang-Chi Hsieh
park 3.1.0.
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Tue, Nov 10, 2020 at 6:17 AM Tom Graves
> tgraves_cs@.com
>
>>> wrote:
>>>
>>>> +1 since its a correctness issue, I think its ok to change the behavior
is, this changes current behavior and by default it will break some
existing streaming queries. But I think it is pretty easy to disable the
check with the new config. In the PR currently there is no objection but
suggestion to hear more voices. Please let me know if you have some
thoughts.
Thanks.
Liang-Chi
Congrats! Welcome all!
Dongjoon Hyun-2 wrote
> Welcome everyone! :D
>
> Bests,
> Dongjoon.
>
> On Tue, Jul 14, 2020 at 11:21 AM Xiao Li
> lixiao@
> wrote:
>
>> Welcome, Dilip, Huaxin and Jungtaek!
>>
>> Xiao
>>
>> On Tue, Jul 14, 2020 at 11:02 AM Holden Karau
> holden@
>
>> wrote:
>>
Just got reply from CRAN admin. It should be fixed now.
Hyukjin Kwon wrote
> Thanks, Liang-chi.
>
> On Thu, 13 Dec 2018, 8:29 am Liang-Chi Hsieh
> viirya@
> wrote:
>
>
>> Sorry for late. There is a malformed record at CRAN package page again.
>> I've
>
.
Thanks.
Liang-Chi Hsieh wrote
> Thanks for letting me know! I will look into it and ask CRAN admin for
> help.
>
>
> Hyukjin Kwon wrote
>> Looks it's happening again. Liang-Chi, do you mind if I ask it again?
>>
>> FYI, R 3.4 is officially deprecated as of
&
R version. From what I see, mostly because
>>> of fixes and packages support, most users of R are fairly up to date? So
>>> perhaps 3.4 as min version is reasonable esp. for Spark 3.
>>>
>>> Are we getting traction with CRAN sysadmin? It seems like this has been
&g
Yeah, thanks Hyukjin Kwon for bringing this up for discussion.
I don't know how higher versions of R are widely used across R community. If
R version 3.1.x was not very commonly used, I think we can discuss to
upgrade minimum R version in next Spark version.
If we ended up with not upgrading,
Congratulations to all new committers!
rxin wrote
> Hi all,
>
> The Apache Spark PMC has recently voted to add several new committers to
> the project, for their contributions:
>
> - Shane Knapp (contributor to infra)
> - Dongjoon Hyun (contributor to ORC support and other parts of Spark)
>
Thanks for pinging me.
Seems to me we should not make assumption about the value of
spark.sql.execution.topKSortFallbackThreshold config. Once it is changed,
the global sort + limit can produce wrong result for now. I will make a PR
for this.
cloud0fan wrote
> + Liang-Chi and Herman,
>
> I
Hi,
It'd be great if there can be any sharing of the offline discussion. Thanks!
Holden Karau wrote
> We’re by the registration sign going to start walking over at 4:05
>
> On Wed, Jun 6, 2018 at 2:43 PM Maximiliano Felice <
> maximilianofelice@
>> wrote:
>
>> Hi!
>>
>> Do we meet at the
Seems like Spark can't access hive-site.xml under cluster mode. One solution
is to add the config `spark.yarn.dist.files=/path/to/hive-site.xml` to your
spark-defaults.conf. And don't forget to call `enableHiveSupport()` on
`SparkSession`.
Tushar Singhal wrote
> Hi Everyone,
>
> I was
Congratulations! Zhenhua Wang
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
ckson (contributor to MLlib and PySpark)
>>
>> Please join me in welcoming Anirudh, Bryan, Cody, Erik, Matt and Seth as
> committers!
>>
>> Matei
>> -
>> To unsubscribe e-mail:
> dev-unsubscribe@.apache
>>
>
reaming sources must
> be executed with writeStream.start()”?
>
>
> Thanks
> Chang
-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Tejas!
>
> Matei
>
> -
> To unsubscribe e-mail:
> dev-unsubscribe@.apache
-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
Sent from: http://apache-spark-d
def plus(v1, v2):
>> > return v1 + v2
>> >
>> > or we can define as:
>> >
>> > plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType())
>> >
>> > We can use it similar to row-by-row UDFs:
>> >
>> > df.withColumn('sum'
def plus(v1, v2):
>> > return v1 + v2
>> >
>> > or we can define as:
>> >
>> > plus = pandas_udf(lambda v1, v2: v1 + v2, DoubleType())
>> >
>> > We can use it similar to row-by-row UDFs:
>> >
>> > df.withColumn('sum'
un.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.spark.sql.hive.client.Shim_v0_12.alterTable(HiveShim.scala:399)
> at
> org.apache.spark.sql.hive.client.HiveClientImp
Jerry) Shao as a commi
>>> tter. Saisai has been contributing to many areas of the
>>> project for a long time, so it’s great to see him join.
>>> Join me in thanking and congratulating him!
>>>
>>> Matei
>>> --
t;>>>> On Mon, Aug 7, 2017 at 8:53 AM, Matei Zaharia <
>>>>>>>
> matei.zaharia@
>> wrote:
>>>>>>> > Hi everyone,
>>>>>>> >
>>>>>>> > The Spark PMC recently voted to add Hyukjin Kwon and Sameer
>>>>>>> Agarwal
>>>>>>> as committers. Join me in con
t; here is the code link:
>
>
> https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
>
>
>
> Would you like help understand it please?
>
>
> Thanks.
> Robin
-
Liang-Chi Hsie
..
>
>
> I sounds environment problem apparently due to missing hashtable (which I
> believe should have been compiled and importable properly).
>
> I suspect few possibilities such as a bug somewhere or unsuccessful manual
> build from Pandas source but I am unable to
ps://github.com/apache/spark/blob/8cd9cdf17a7a4ad6f2eecd7c4b388ca363c20982/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L918
>
> Shouldn't lines 925-927 be before 920-922 ?
>
> 2) https://issues.apache.org/jira/browse/SPARK-20392
>
> Is it safe to use it on top of 2.2.0 ?
>
> Regards,
&g
a and b
> in the whole pipeline, even if the result isn't deterministic, but the
> computation is correct.
>
> Thanks
> Chang
>
>
> On Mon, Jul 17, 2017 at 10:49 PM, Liang-Chi Hsieh
> viirya@
> wrote:
>
>>
>> IIUC, the evaluation orde
>>> does the same thing for GroupBy non-deterministic. From Map-Reduce point
>>> of
>>> view, Join is also GroupBy in essence .
>>>
>>> @Liang Chi Hsieh
>>> https://plus.google.com/u/0/103179362592085650735?prsrc=4;
>>>
>>> in wh
of
> view, Join is also GroupBy in essence .
>
> @Liang Chi Hsieh
> https://plus.google.com/u/0/103179362592085650735?prsrc=4;
>
> in which situation, semantics will be changed?
>
> Thanks
> Chang
>
> On Mon, Jul 17, 2017 at 3:29 PM, Liang-Chi Hsieh
> vi
dious since we have lots of Hive SQL being migrated to Spark.
>> And
>> this workaround is equivalent to insert a Project between Join operator
>> and its child.
>>
>> Why not do it in PullOutNondeterministic?
>>
>> Thanks
>> Chang
>>
>&
ch_name#72, vsbl_flg#73, delet_flag#74, etl_batch_id#75L,
>> > updt_time#76, cur_flag#77, bkgrnd_categ_skid#78L, bkgrnd_categ_id#79L,
>> > site_categ_id#80, site_categ_parnt_id#81]
>> >
>> > Does spark sql not support syntax "case when" in JOIN? Additional, my
>> spark
gt;> The documentation corresponding to this release can be found at:
>>>>> https://people.apache.org/~pwendell/spark-releases/spark-
>>>>> 2.2.0-rc6-docs/
>>>>>
>>>>>
>>>>> *FAQ*
>>>>>
>>>>> *H
mean I use kryo with
> more
> than 2000 partitions all the time, and it worked before. Or was I simply
> not hitting this bug because there are other conditions that also need to
> be satisfied besides kryo and 2000+ partitions?
>
> On Jun 19, 2017 2:20 AM, "Liang-Chi
I think it's not. This is a feature added recently.
Hyukjin Kwon wrote
> Is this a regression BTW? I am just curious.
>
> On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh"
> viirya@
> wrote:
>
> -1. When using kyro serializer and partition number is greater than
-1. When using kyro serializer and partition number is greater than 2000.
There seems a NPE issue needed to fix.
SPARK-21133 <https://issues.apache.org/jira/browse/SPARK-21133>
-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in c
e how to use that.
>
> Thanks,
> Aviral Agarwal
>
> On Mar 24, 2017 09:20, "Liang-Chi Hsieh"
> viirya@
> wrote:
>
>
> Hi,
>
> You need to resolve the expressions before passing into creating
> UnsafeProjection.
>
>
>
> Aviral Agarwal wrote
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> Executor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> lExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
>
> This might be because the Expression is
Just found that you can specify number of features when loading libsvm
source:
val df = spark.read.option("numFeatures", "100").format("libsvm")
Liang-Chi Hsieh wrote
> As the libsvm format can't specify number of features, and looks like
> NaiveBayes
t; val model = new
>> NaiveBayes().setThresholds(Array(10.0,1.0)).fit(trainingData)
>> val predictions = model.transform(testData)
>> predictions.show()
>>
>>
>> OK, I have got my model by the cole above, but how can I use this model
>> to predict the
n of the query, which has several "parent" nodes,
> its "parents" have to reuse it by creating new RDDs?
-
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabbl
or RDD[InternalRow]?
-----
Liang-Chi Hsieh | @viirya
Spark Technology Center
http://www.spark.tc/
--
View this message in context:
http://apache-spark-developers-list.1001551.n3.nabble.com/How-to-cache-SparkPlan-execute-for-reusing-tp21097p21098.html
Sent from the Apache Spark Developers List ma
Implementation-of-RNN-LSTM-in-Spark-tp14866p21060.html
>> Sent from the Apache Spark Developers List mailing list archive at
>> Nabble.com.
>>
>> ---------
>> To unsubscribe e-mail:
> dev-unsubscribe
ply to this email, your message will be added
> to the
> discussion below:
>
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-The-driver-hangs-at-DataFrame-rdd-in-Spark-2-1-0-tp21052p21069.html
>
> To start a new topic under Apache Spark Develo
set.rdd$lzycompute(Dataset.scala:2544)
> org.apache.spark.sql.Dataset.rdd(Dataset.scala:2544)...
>
>
> The CPU usage of the driver remains 100% like this:
>
>
>
> I didn't find this issue in Spark 1.6.2, what causes this in Spark 2.1.0?
>
>
> Any help is greatl
ark.sql.DataFrame
>>>
>>> def f(data: DataFrame): DataFrame = {
>>> val df = data.filter("id>10")
>>> df.cache
>>> df.count
>>> df
>>> }
>>>
>>> f(spark.range(100).asInstanceOf[DataFrame]).count // output
; (5)
> (6) val rdd2 = loadData2
> (7)
> (8) rdd1.checkpoint()
> (9)
> (10) rdd1
> (11).join(rdd2)
> (12).saveAsObjectFile(...)
>
> /
>
> Thanks in advance,
> Leo
-
Liang-Chi Hsieh | @viirya
Spark Techn
Hi Maciej,
FYI, this fix is submitted at https://github.com/apache/spark/pull/16785.
Liang-Chi Hsieh wrote
> Hi Maciej,
>
> After looking into the details of the time spent on preparing the executed
> plan, the cause of the significant difference between 1.6 and current
>
503 ms
1613 ms
2279 ms
2349 ms
2573 ms
Liang-Chi Hsieh wrote
> Hi Maciej,
>
> Thanks for the info you provided.
>
> I tried to run the same example with 1.6 and current branch and record the
> difference between the time cost on preparing the executed plan.
>
>
ged much in 2.x. They used RDDs for
> fitting in 1.6 and, as far as I can tell, they still do that in 2.x. And
> the problem doesn't look that related to the data processing part in the
> first place.
>
>
> On 02/02/2017 07:22 AM, Liang-Chi Hsieh wrote:
>> Hi Maciej,
>>
&
en 1.6 and 2.0?
>
>
> On Thu, 2 Feb 2017 at 08:22 Liang-Chi Hsieh
> viirya@
> wrote:
>
>>
>> Hi Maciej,
>>
>> FYI, the PR is at https://github.com/apache/spark/pull/16775.
>>
>>
>> Liang-Chi Hsieh wrote
>> > Hi Macie
. The Dataframe looks like
>
> k1 | k2 | k3 | v1
>
> a1 | b1 | c1 | 879
>
> a2 | b2 | c2 | 769
>
> a1 | b1 | c1 | 129
>
> a2 | b2 | c2 | 323
> I need to first run groupBy (k1, k2, k3) and collect_list(v1), and then
> compute quantiles [10th, 50th...] on list
Hi Maciej,
FYI, the PR is at https://github.com/apache/spark/pull/16775.
Liang-Chi Hsieh wrote
> Hi Maciej,
>
> Basically the fitting algorithm in Pipeline is an iterative operation.
> Running iterative algorithm on Dataset would have RDD lineages and query
> plans that gro
t of the time
> idle so it looks like it is a problem with the optimizer. Is it a known
> issue? Are there any changes I've missed, that could lead to this
> behavior?
>
> --
> Best,
> Maciej
>
>
> -
> To unsubscribe e-mail:
> dev-unsubscribe@.apache
-
1 - 100 of 150 matches
Mail list logo