date:20200323

[jira] [Resolved] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31229.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27990
[https://github.com/apache/spark/pull/27990]

> Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> 
> complex types
> ---
>
> Key: SPARK-31229
> URL: https://issues.apache.org/jira/browse/SPARK-31229
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
> Fix For: 3.0.0
>
>
> This JIRA targets to describe to add the unittests. This is rather a followup 
> work at SPARK-31166; however, this JIRA targets to include the tests with 
> struct type and array types against null types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-31229:


Assignee: Hyukjin Kwon

> Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> 
> complex types
> ---
>
> Key: SPARK-31229
> URL: https://issues.apache.org/jira/browse/SPARK-31229
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>
> This JIRA targets to describe to add the unittests. This is rather a followup 
> work at SPARK-31166; however, this JIRA targets to include the tests with 
> struct type and array types against null types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31205) support string literal as the second argument of date_add/date_sub functions

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31205.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27965
[https://github.com/apache/spark/pull/27965]

> support string literal as the second argument of date_add/date_sub functions
> 
>
> Key: SPARK-31205
> URL: https://issues.apache.org/jira/browse/SPARK-31205
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)

2020-03-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065283#comment-17065283
 ] 

Hyukjin Kwon commented on SPARK-31209:
--

Please go ahead and upgrade if you're interested in.

> Not compatible with new version of scalatest (3.1.0 and above)
> --
>
> Key: SPARK-31209
> URL: https://issues.apache.org/jira/browse/SPARK-31209
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Timothy Zhang
>Priority: Major
>
> Since  ScalaTest's style traits and classes were moved and renamed 
> ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find 
> FunSpec when I add new version of scalatest in library dependency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-23 Thread jiaan.geng (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17065239#comment-17065239
 ] 

jiaan.geng commented on SPARK-31210:


Yes, spark 3.0.0-preview2 not contains this fix.

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31223) Update py code to generate data in testsuites

2020-03-23 Thread Huaxin Gao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-31223:
---
Summary: Update py code to generate data in testsuites  (was: Update py 
code to generate dates in testsuites)

> Update py code to generate data in testsuites
> -
>
> Key: SPARK-31223
> URL: https://issues.apache.org/jira/browse/SPARK-31223
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Trivial
>
> in FValueTestSuite/ANOVASelectorSuite/ANOVATestSuite/...:
>  
> can not regenerate the test datasets with given py code (like 
> {color:#676773}X = np.random.rand(20, 6){color}), so:
> 1, directly create X like : X = np.array(...);
> 2, or, set a seed at first;
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-27963) Allow dynamic allocation without an external shuffle service

2020-03-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27963:
--
Labels: release-notes  (was: )

> Allow dynamic allocation without an external shuffle service
> 
>
> Key: SPARK-27963
> URL: https://issues.apache.org/jira/browse/SPARK-27963
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Marcelo Masiero Vanzin
>Assignee: Marcelo Masiero Vanzin
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> It would be useful for users to be able to enable dynamic allocation without 
> the need to provision an external shuffle service. One immediate use case is 
> the ability to use dynamic allocation on Kubernetes, which doesn't yet have 
> that service.
> This has been suggested before (e.g. 
> https://github.com/apache/spark/pull/24083, which was attached to the 
> k8s-specific SPARK-24432), and can actually be done without affecting the 
> internals of the Spark scheduler (aside from the dynamic allocation code). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31203) Upgrade derby to 10.14.2.0 from 10.12.1.1

2020-03-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31203.
---
Resolution: Later

> Upgrade derby to 10.14.2.0 from 10.12.1.1
> -
>
> Key: SPARK-31203
> URL: https://issues.apache.org/jira/browse/SPARK-31203
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
> Environment: This jira is to upgrade  derby version to 10.14.2.0 from 
> 10.12.1.1
> The upgrade is due to an ALREADY DISCLOSED VULNERABILITY (CVE-2018-1313 )
>Reporter: Udbhav Agrawal
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-23 Thread Mingli Rui (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064942#comment-17064942
 ] 

Mingli Rui edited comment on SPARK-31210 at 3/23/20, 4:53 PM:
--

Thanks for your investigation! I agree this issue is duplicated to SPARK-30254. 
The fix for SPARK-30254 is done on Dec 13 2019. I am using spark 3.0.0-preview2 
which is released on Dec 17 2019. Could you please confirm whether this fix is 
included in spark 3.0.0-preview2 or not? Thanks a lot!


was (Author: minglirui):
Thanks for you investigation! I agree this issue is duplicated to SPARK-30254. 
The fix for SPARK-30254 is Dec 13 2019. I am using spark 3.0.0-preview2 which 
is released on Dec 17 2019. Could you please confirm whether this fix is 
included in spark 3.0.0-preview2 or not? Thanks a lot!

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31210) An issue for Spark SQL LIKE-with-ESCAPE clause

2020-03-23 Thread Mingli Rui (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064942#comment-17064942
 ] 

Mingli Rui commented on SPARK-31210:


Thanks for you investigation! I agree this issue is duplicated to SPARK-30254. 
The fix for SPARK-30254 is Dec 13 2019. I am using spark 3.0.0-preview2 which 
is released on Dec 17 2019. Could you please confirm whether this fix is 
included in spark 3.0.0-preview2 or not? Thanks a lot!

> An issue for Spark SQL LIKE-with-ESCAPE clause
> --
>
> Key: SPARK-31210
> URL: https://issues.apache.org/jira/browse/SPARK-31210
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mingli Rui
>Priority: Major
>
> I try to use LIKE with ESCAPE for Spark 3.0.0-preview2. But I find in it 
> doesn't work in below cases.
> The database table
> ==
> create or replace table test_table_like ( subject string)
> insert into $test_table_like values ('100 times'), ('1000 times'), ('100%')
>  
> Repro
> 
> val result2 = sparkSession.sql(
>  s"select * from test_table_like where subject like '100^%' escape '^' order 
> by 1")
> "100%" is expected to returned, but it doesn't. I debug into the code to 
> check the logical plan.
> In the logical plan, the LIKE is transformed as "StartsWith(subject#130, 
> 100^)". It looks it is incorrect.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-23 Thread leishuiyu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808
 ] 

leishuiyu edited comment on SPARK-31191 at 3/23/20, 1:41 PM:
-

hello 

[~yumwang]

when edit conf/spark-defaults.conf  ,this error still happen
{code:java}
//
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
spark.sql.hive.metastore.version 2.3.0
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
{code}


was (Author: leishuiyu):
when edit conf/spark-defaults.conf  ,this error still happen
{code:java}
//
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
spark.sql.hive.metastore.version 2.3.0
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
{code}

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
>

[jira] [Comment Edited] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-23 Thread leishuiyu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808
 ] 

leishuiyu edited comment on SPARK-31191 at 3/23/20, 1:40 PM:
-

when edit conf/spark-defaults.conf  ,this error still happen
{code:java}
//
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
## Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.
spark.sql.hive.metastore.version 2.3.0
# Example:
# spark.master spark://master:7077
# spark.eventLog.enabled   true
# spark.eventLog.dir   hdfs://namenode:8021/directory
# spark.serializer org.apache.spark.serializer.KryoSerializer
# spark.driver.memory  5g
# spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value 
-Dnumbers="one two three"
{code}


was (Author: leishuiyu):
when edit conf/spark-defaults.conf  ,this error still happens

!image-2020-03-23-21-37-17-663.png!

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>  ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does 
> not match metastore's schema version 2.3.0 Metastore is not upgraded or 
> corrupt) at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) 
> at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>  at 
>

[jira] [Commented] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-23 Thread leishuiyu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064808#comment-17064808
 ] 

leishuiyu commented on SPARK-31191:
---

when edit conf/spark-defaults.conf  ,this error still happens

!image-2020-03-23-21-37-17-663.png!

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>  ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does 
> not match metastore's schema version 2.3.0 Metastore is not upgraded or 
> corrupt) at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) 
> at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>  ... 23 more
> {code}
> h3.   2.Find the reason
>  query the source code, in spark jars directory have 
> hive-metastore-1.2.1.spark2.jar
>  the 1.2.1 version match 1.2.0 ,so generate the exception
>   
>  
> {code:java}
> //代码占位符
> private static final Map EQUIVALENT_VERSIONS =
> ImmutableMap.of("0.13.1", "0.13.0",
> "1.0.0", "0.14.0",
> "1.0.1", "1.0.0",
> "1.1.1", "1.1.0",
> "1.2.1", "1.2.0"
> );
> {code}
>  
> h3. 3.Is there any solution to this problem
>    can edit hive-site.xml  hive.metastore.schema.verification set true,but 
> new problems may arise
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31191) Spark SQL and hive metastore are incompatible

2020-03-23 Thread leishuiyu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

leishuiyu updated SPARK-31191:
--
Attachment: image-2020-03-23-21-37-17-663.png

> Spark SQL and hive metastore are incompatible
> -
>
> Key: SPARK-31191
> URL: https://issues.apache.org/jira/browse/SPARK-31191
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
> Environment:  the spark version 2.3.0
> the hive version 2.3.3
>Reporter: leishuiyu
>Priority: Major
> Attachments: image-2020-03-23-21-37-17-663.png
>
>
> # 
> h3. When I execute bin/spark-sql, an exception occurs
>  
> {code:java}
> Caused by: java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClientCaused by: 
> java.lang.RuntimeException: Unable to instantiate 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
>  at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
>  at 
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) 
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at 
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503) 
> ... 12 moreCaused by: java.lang.reflect.InvocationTargetException at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
>  ... 18 moreCaused by: MetaException(message:Hive Schema version 1.2.0 does 
> not match metastore's schema version 2.3.0 Metastore is not upgraded or 
> corrupt) at 
> org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6679)
>  at 
> org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114) 
> at com.sun.proxy.$Proxy6.verifySchema(Unknown Source) at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66)
>  at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
>  at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199)
>  at 
> org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74)
>  ... 23 more
> {code}
> h3.   2.Find the reason
>  query the source code, in spark jars directory have 
> hive-metastore-1.2.1.spark2.jar
>  the 1.2.1 version match 1.2.0 ,so generate the exception
>   
>  
> {code:java}
> //代码占位符
> private static final Map EQUIVALENT_VERSIONS =
> ImmutableMap.of("0.13.1", "0.13.0",
> "1.0.0", "0.14.0",
> "1.0.1", "1.0.0",
> "1.1.1", "1.1.0",
> "1.2.1", "1.2.0"
> );
> {code}
>  
> h3. 3.Is there any solution to this problem
>    can edit hive-site.xml  hive.metastore.schema.verification set true,but 
> new problems may arise
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31230) use statement plans in DataFrameWriter(V2)

2020-03-23 Thread Wenchen Fan (Jira)

Wenchen Fan created SPARK-31230:
---

 Summary: use statement plans in DataFrameWriter(V2)
 Key: SPARK-31230
 URL: https://issues.apache.org/jira/browse/SPARK-31230
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types

2020-03-23 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31229:


 Summary: Add unit tests TypeCoercion.findTypeForComplex and 
Cast.canCast about null <> complex types
 Key: SPARK-31229
 URL: https://issues.apache.org/jira/browse/SPARK-31229
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


This JIRA targets to describe to add the unittests. This is rather a followup 
work at SPARK-31166; however, this JIRA targets to include the tests with 
struct type and array types against null types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31229) Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> complex types

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31229:
-
Priority: Minor  (was: Major)

> Add unit tests TypeCoercion.findTypeForComplex and Cast.canCast about null <> 
> complex types
> ---
>
> Key: SPARK-31229
> URL: https://issues.apache.org/jira/browse/SPARK-31229
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> This JIRA targets to describe to add the unittests. This is rather a followup 
> work at SPARK-31166; however, this JIRA targets to include the tests with 
> struct type and array types against null types.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31228) Add version information to the configuration of Kafka

2020-03-23 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-31228:
--

 Summary: Add version information to the configuration of Kafka
 Key: SPARK-31228
 URL: https://issues.apache.org/jira/browse/SPARK-31228
 Project: Spark
  Issue Type: Sub-task
  Components: DStreams
Affects Versions: 3.1.0
Reporter: jiaan.geng


 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/package.scala



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31227) Non-nullable null type should not coerce to nullable type

2020-03-23 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-31227:


 Summary: Non-nullable null type should not coerce to nullable type
 Key: SPARK-31227
 URL: https://issues.apache.org/jira/browse/SPARK-31227
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Hyukjin Kwon


{code}
scala> spark.range(10).selectExpr("array()").printSchema()
root
 |-- array(): array (nullable = false)
 ||-- element: null (containsNull = false)


scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema()
root
 |-- arr: array (nullable = false)
 ||-- element: null (containsNull = false)


scala> spark.range(10).selectExpr("concat(array(), array(1)) as 
arr").printSchema()
root
 |-- arr: array (nullable = false)
 ||-- element: integer (containsNull = true)
{code}

The last case should not coerce to nullable type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31227) Non-nullable null type should not coerce to nullable type

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-31227:
-
Affects Version/s: (was: 3.0.0)
   3.1.0

> Non-nullable null type should not coerce to nullable type
> -
>
> Key: SPARK-31227
> URL: https://issues.apache.org/jira/browse/SPARK-31227
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> {code}
> scala> spark.range(10).selectExpr("array()").printSchema()
> root
>  |-- array(): array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array()) as arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: null (containsNull = false)
> scala> spark.range(10).selectExpr("concat(array(), array(1)) as 
> arr").printSchema()
> root
>  |-- arr: array (nullable = false)
>  ||-- element: integer (containsNull = true)
> {code}
> The last case should not coerce to nullable type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-31194) spark sql runs successfully with query not specifying condition next to where

2020-03-23 Thread Ayoub Omari (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayoub Omari closed SPARK-31194.
---

Where works as an alias to tableName in this case 

> spark sql runs successfully with query not specifying condition next to where 
> --
>
> Key: SPARK-31194
> URL: https://issues.apache.org/jira/browse/SPARK-31194
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Ayoub Omari
>Priority: Major
>
> When having a sql query as follows:
> {color:#00875a}_SELECT *_{color}
> {color:#00875a}_FROM people_{color}
> {color:#00875a}_WHERE_{color}
> shouldn't we throw a parsing exception because of __unspecified _condition_  
> _?_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-31194) spark sql runs successfully with query not specifying condition next to where

2020-03-23 Thread Ayoub Omari (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064692#comment-17064692
 ] 

Ayoub Omari edited comment on SPARK-31194 at 3/23/20, 10:28 AM:


Oh ok ! Closing as Where works as an alias to tableName in this case 


was (Author: sarmon):
Where works as an alias to tableName in this case 

> spark sql runs successfully with query not specifying condition next to where 
> --
>
> Key: SPARK-31194
> URL: https://issues.apache.org/jira/browse/SPARK-31194
> Project: Spark
>  Issue Type: Story
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Ayoub Omari
>Priority: Major
>
> When having a sql query as follows:
> {color:#00875a}_SELECT *_{color}
> {color:#00875a}_FROM people_{color}
> {color:#00875a}_WHERE_{color}
> shouldn't we throw a parsing exception because of __unspecified _condition_  
> _?_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31219) YarnShuffleService doesn't close idle netty channel

2020-03-23 Thread Manu Zhang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manu Zhang updated SPARK-31219:
---
Description: 
Recently, we find our YarnShuffleService has a lot of [half-open 
connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html]
 where shuffle servers' connections are active while clients have already 
closed. 

For example, from server's `ss -nt sport = :7337` output we have
{code:java}
ESTAB 0 0 server:7337 client:port

{code}
However, on client `ss -nt dport =: 7337 | grep server` would return nothing.

Looking at the code,  `YarnShuffleService` creates a `TransportContext` with 
`closeIdleConnections` set to false.
{code:java}
public class YarnShuffleService extends AuxiliaryService {
  ...
  @Override  protected void serviceInit(Configuration conf) throws Exception { 
... 
transportContext = new TransportContext(transportConf, blockHandler); 
...
  }
  ...
}

public class TransportContext implements Closeable {
  ...

  public TransportContext(TransportConf conf, RpcHandler rpcHandler) {   
this(conf, rpcHandler, false, false);  
  }
  public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean 
closeIdleConnections) {
this(conf, rpcHandler, closeIdleConnections, false);  
  }
  ...
}{code}
Hence, it's possible the channel  may never get closed at server side if the 
server misses the event that the client has closed it.

I find that parameter is true for `ExternalShuffleService`.

Is there any reason for the difference here ?  Can we enable 
closeIdleConnections in YarnShuffleService or at least add a configuration to 
enable it ?

 

  was:
Recently, we find our YarnShuffleService has a lot of [half-open 
connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html]
 where shuffle servers' connections are active while clients have already 
closed. 

For example, from server's `ss -nt sport = :7337` output we have
{code:java}
ESTAB 0 0 server:7337 client:port

{code}
However, on client `ss -nt dport =: 7337 | grep server` would return nothing.

Looking at the code,  `YarnShuffleService` creates a `TransportContext` with 
`closeIdleConnections` set to false.
{code:java}
public class YarnShuffleService extends AuxiliaryService {
  ...
  @Override  protected void serviceInit(Configuration conf) throws Exception { 
... 
transportContext = new TransportContext(transportConf, blockHandler); 
...
  }
  ...
}

public class TransportContext implements Closeable {
  ...

  public TransportContext(TransportConf conf, RpcHandler rpcHandler) {   
this(conf, rpcHandler, false, false);  
  }
  public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean 
closeIdleConnections) {
this(conf, rpcHandler, closeIdleConnections, false);  
  }
  ...
}{code}
Hence, it's possible the channel  may never get closed at server side if the 
server misses the event that the client has closed it.

I find that parameter is true for `ExternalShuffleService`.

Is there any reason for the difference here ? Will it be valuable to add a 
configuration to allow enabling closeIdleConnections ?

 


> YarnShuffleService doesn't close idle netty channel
> ---
>
> Key: SPARK-31219
> URL: https://issues.apache.org/jira/browse/SPARK-31219
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Manu Zhang
>Priority: Major
>
> Recently, we find our YarnShuffleService has a lot of [half-open 
> connections|https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html]
>  where shuffle servers' connections are active while clients have already 
> closed. 
> For example, from server's `ss -nt sport = :7337` output we have
> {code:java}
> ESTAB 0 0 server:7337 client:port
> {code}
> However, on client `ss -nt dport =: 7337 | grep server` would return nothing.
> Looking at the code,  `YarnShuffleService` creates a `TransportContext` with 
> `closeIdleConnections` set to false.
> {code:java}
> public class YarnShuffleService extends AuxiliaryService {
>   ...
>   @Override  protected void serviceInit(Configuration conf) throws Exception 
> { 
> ... 
> transportContext = new TransportContext(transportConf, blockHandler); 
> ...
>   }
>   ...
> }
> public class TransportContext implements Closeable {
>   ...
>   public TransportContext(TransportConf conf, RpcHandler rpcHandler) {   
> this(conf, rpcHandler, false, false);  
>   }
>   public TransportContext(TransportConf conf, RpcHandler rpcHandler, boolean 
> closeIdleConnections) {
> this(conf, rpcHandler, closeIdleConnections, false);  
>   }
>   ...
> }{code}
> Hence, it's possible the channel  may never get closed at

[jira] [Created] (SPARK-31226) SizeBasedCoalesce logic error

2020-03-23 Thread angerszhu (Jira)

angerszhu created SPARK-31226:
-

 Summary: SizeBasedCoalesce logic error
 Key: SPARK-31226
 URL: https://issues.apache.org/jira/browse/SPARK-31226
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31226) SizeBasedCoalesce logic error

2020-03-23 Thread angerszhu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-31226:
--
Description: 
In spark UT, 

SizeBasedCoalecse's logic is wrong

> SizeBasedCoalesce logic error
> -
>
> Key: SPARK-31226
> URL: https://issues.apache.org/jira/browse/SPARK-31226
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Minor
>
> In spark UT, 
> SizeBasedCoalecse's logic is wrong



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31190) ScalaReflection should not erasure user defined AnyVal type

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-31190:

Summary: ScalaReflection should not erasure user defined AnyVal type  (was: 
ScalaReflection should erasure non user defined AnyVal type)

> ScalaReflection should not erasure user defined AnyVal type
> ---
>
> Key: SPARK-31190
> URL: https://issues.apache.org/jira/browse/SPARK-31190
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
>
> We should only not do erasure for non user defined AnyVal type, but still do 
> erasure for other types, e.g. Any, which could give better error message for 
> end user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31190) ScalaReflection should erasure non user defined AnyVal type

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31190.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27959
[https://github.com/apache/spark/pull/27959]

> ScalaReflection should erasure non user defined AnyVal type
> ---
>
> Key: SPARK-31190
> URL: https://issues.apache.org/jira/browse/SPARK-31190
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
>
> We should only not do erasure for non user defined AnyVal type, but still do 
> erasure for other types, e.g. Any, which could give better error message for 
> end user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31190) ScalaReflection should erasure non user defined AnyVal type

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31190:
---

Assignee: wuyi

> ScalaReflection should erasure non user defined AnyVal type
> ---
>
> Key: SPARK-31190
> URL: https://issues.apache.org/jira/browse/SPARK-31190
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>
> We should only not do erasure for non user defined AnyVal type, but still do 
> erasure for other types, e.g. Any, which could give better error message for 
> end user.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-31220) repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum when spark.sql.adaptive.enabled

2020-03-23 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31220:

Summary: repartition obeys 
spark.sql.adaptive.coalescePartitions.initialPartitionNum when 
spark.sql.adaptive.enabled  (was: distribute by obeys 
spark.sql.adaptive.coalescePartitions.initialPartitionNum when 
spark.sql.adaptive.enabled)

> repartition obeys spark.sql.adaptive.coalescePartitions.initialPartitionNum 
> when spark.sql.adaptive.enabled
> ---
>
> Key: SPARK-31220
> URL: https://issues.apache.org/jira/browse/SPARK-31220
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> {code:scala}
> spark.sql("CREATE TABLE spark_31220(id int)")
> spark.sql("set 
> spark.sql.adaptive.coalescePartitions.initialPartitionNum=1000")
> spark.sql("set spark.sql.adaptive.enabled=true")
> {code}
> {noformat}
> scala> spark.sql("SELECT id from spark_31220 GROUP BY id").explain
> == Physical Plan ==
> AdaptiveSparkPlan(isFinalPlan=false)
> +- HashAggregate(keys=[id#5], functions=[])
>+- Exchange hashpartitioning(id#5, 1000), true, [id=#171]
>   +- HashAggregate(keys=[id#5], functions=[])
>  +- FileScan parquet default.spark_31220[id#5] Batched: true, 
> DataFilters: [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/root/opensource/apache-spark/spark-warehouse/spark_31220],
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> scala> spark.sql("SELECT id from spark_31220 DISTRIBUTE BY id").explain
> == Physical Plan ==
> AdaptiveSparkPlan(isFinalPlan=false)
> +- Exchange hashpartitioning(id#5, 200), false, [id=#179]
>+- FileScan parquet default.spark_31220[id#5] Batched: true, DataFilters: 
> [], Format: Parquet, Location: 
> InMemoryFileIndex[file:/root/opensource/apache-spark/spark-warehouse/spark_31220],
>  PartitionFilters: [], PushedFilters: [], ReadSchema: struct
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31225) Override `sql` method for OuterReference

2020-03-23 Thread Kent Yao (Jira)

Kent Yao created SPARK-31225:


 Summary: Override `sql` method for  OuterReference
 Key: SPARK-31225
 URL: https://issues.apache.org/jira/browse/SPARK-31225
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Kent Yao


OuterReference is LeafExpression, so it's children is Nil, which makes its SQL 
representation always be outer(). This makes our explain-command and error msg 
unclear when OuterReference exists



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31211.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27974
[https://github.com/apache/spark/pull/27974]

> Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
> ---
>
> Key: SPARK-31211
> URL: https://issues.apache.org/jira/browse/SPARK-31211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for 
> instance 1000-02-29:
> {code}
> $ export TZ="America/Los_Angeles"
> {code}
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> 
> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")
> scala> val df = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> scala> 
> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
> {code}
> Load the parquet files back by Spark 3.1.0-SNAPSHOT:
> {code:scala}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_231)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> +--+
> |  date|
> +--+
> |1000-03-06|
> +--+
> scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a 
> leap year
>   at java.time.LocalDate.create(LocalDate.java:429)
>   at java.time.LocalDate.of(LocalDate.java:269)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-31211) Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5

2020-03-23 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31211:
---

Assignee: Maxim Gekk

> Failure on loading 1000-02-29 from parquet saved by Spark 2.4.5
> ---
>
> Key: SPARK-31211
> URL: https://issues.apache.org/jira/browse/SPARK-31211
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Save valid date in Julian calendar by Spark 2.4.5 in a leap year, for 
> instance 1000-02-29:
> {code}
> $ export TZ="America/Los_Angeles"
> {code}
> {code:scala}
> scala> spark.conf.set("spark.sql.session.timeZone", "America/Los_Angeles")
> scala> 
> df.write.mode("overwrite").format("avro").save("/Users/maxim/tmp/before_1582/2_4_5_date_avro_leap")
> scala> val df = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> scala> 
> df.write.mode("overwrite").parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap")
> {code}
> Load the parquet files back by Spark 3.1.0-SNAPSHOT:
> {code:scala}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 3.1.0-SNAPSHOT
>   /_/
> Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 
> 1.8.0_231)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> +--+
> |  date|
> +--+
> |1000-03-06|
> +--+
> scala> spark.conf.set("spark.sql.legacy.parquet.rebaseDateTime.enabled", true)
> scala> spark.read.parquet("/Users/maxim/tmp/before_1582/2_4_5_date_leap").show
> 20/03/21 03:03:59 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 3)
> java.time.DateTimeException: Invalid date 'February 29' as '1000' is not a 
> leap year
>   at java.time.LocalDate.create(LocalDate.java:429)
>   at java.time.LocalDate.of(LocalDate.java:269)
>   at 
> org.apache.spark.sql.catalyst.util.DateTimeUtils$.rebaseJulianToGregorianDays(DateTimeUtils.scala:1008)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-31224) Support views in both SHOW CREATE TABLE and SHOW CREATE TABLE AS SERDE

2020-03-23 Thread L. C. Hsieh (Jira)

L. C. Hsieh created SPARK-31224:
---

 Summary: Support views in both SHOW CREATE TABLE and SHOW CREATE 
TABLE AS SERDE
 Key: SPARK-31224
 URL: https://issues.apache.org/jira/browse/SPARK-31224
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh


For now {{SHOW CREATE TABLE}} command doesn't support views, but {{SHOW CREATE 
TABLE AS SERDE}} supports it. Since the views syntax are the same between Hive 
DDL and Spark DDL, we should be able to support views in both two commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31173) Spark Kubernetes add tolerations and nodeName support

2020-03-23 Thread zhongwei liu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064563#comment-17064563
 ] 

zhongwei liu commented on SPARK-31173:
--

[~seedjeffwan] The first one is the key point.

> Spark Kubernetes add tolerations and nodeName support
> -
>
> Key: SPARK-31173
> URL: https://issues.apache.org/jira/browse/SPARK-31173
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0, 2.4.6
> Environment: Alibaba Cloud ACK with spark 
> operator(v1beta2-1.1.0-2.4.5) and spark(2.4.5)
>Reporter: zhongwei liu
>Priority: Trivial
>  Labels: features
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When you run spark on serverless kubernetes cluster(virtual-kubelet). you 
> need to specific the nodeSelectors,tolerations even nodeName when you want to 
> gain better scheduling performance. Currently spark doesn't support 
> tolerations. If you want to use this feature, You must use admission 
> controller webhook to decorate the pod. But the performance is extremely bad. 
> Here is the benchmark. 
> With webhook 
> Batch Size: 500 Pod creation: about 7 Pods/s   All Pods running: 5min
> Without webhook 
> Batch Size: 500 Pod creation: more than 500 Pods/s All Pods running: 45s
> Adding tolerations and nodeName in spark will bring great help when you want 
> to run a large scale job on serverless kubernetes cluster.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31209) Not compatible with new version of scalatest (3.1.0 and above)

2020-03-23 Thread Timothy Zhang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064561#comment-17064561
 ] 

Timothy Zhang commented on SPARK-31209:
---

Yes, I think so. It is better to upgrade all classes extended from FunSuite, 
FlatSpec, etc.. of scalatest. 

> Not compatible with new version of scalatest (3.1.0 and above)
> --
>
> Key: SPARK-31209
> URL: https://issues.apache.org/jira/browse/SPARK-31209
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Timothy Zhang
>Priority: Major
>
> Since  ScalaTest's style traits and classes were moved and renamed 
> ([http://www.scalatest.org/release_notes/3.1.0]) there are errors as not find 
> FunSpec when I add new version of scalatest in library dependency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31132) Optimized Logical Plan cast('' as timestamp) is null

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31132.
--
Resolution: Incomplete

2.3.x is EOLed. Please try this in higher versions and also show reproducible 
steps.

> Optimized Logical Plan  cast('' as timestamp) is null
> -
>
> Key: SPARK-31132
> URL: https://issues.apache.org/jira/browse/SPARK-31132
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: duyf
>Priority: Major
> Attachments: image-2020-03-12-18-34-22-571.png
>
>
> The type of receive_time is timestamp，The value is  '2020-02-10'
> SQL：
> select case when receive_time<>''
>  then 1 else 0 end as tt
>  from xxx.xxx
>  where order_id=1234；
> The output error is 0；
> Because cast( as timestamp) is null, receive_time<>'' is false;
>  
> !image-2020-03-12-18-34-22-571.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31140) Support Quick sample in RDD

2020-03-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064557#comment-17064557
 ] 

Hyukjin Kwon commented on SPARK-31140:
--

Seems like very simply able to work around. Also given that RDD API is almost 
freeze now, I think it's not worthwhile adding it.

> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31140) Support Quick sample in RDD

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31140.
--
Resolution: Won't Fix

> Support Quick sample in RDD
> ---
>
> Key: SPARK-31140
> URL: https://issues.apache.org/jira/browse/SPARK-31140
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: deshanxiao
>Priority: Minor
>
> RDD.sample use the function of *filter* to pick up the data we need. It means 
> that if the raw data is very huge, we must spend too much time reading it. We 
> can filter the raw partition to speed up the processing of sample.
> {code:java}
>   override def compute(splitIn: Partition, context: TaskContext): Iterator[U] 
> = {
> val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition]
> val thisSampler = sampler.clone
> thisSampler.setSeed(split.seed)
> thisSampler.sample(firstParent[T].iterator(split.prev, context))
>   }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-31152) Issue with Spark Context Initialization i.e. SparkSession.builder.getOrCreate()

2020-03-23 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17064556#comment-17064556
 ] 

Hyukjin Kwon commented on SPARK-31152:
--

That error isn't from Spark - seems like nothing suggests it's the issue in 
Spark.

> Issue with Spark Context Initialization  i.e. 
> SparkSession.builder.getOrCreate()
> 
>
> Key: SPARK-31152
> URL: https://issues.apache.org/jira/browse/SPARK-31152
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5, 3.0.0
> Environment: OS : Windows 10 
> Spark : spark-3.0.0-preview2-bin-hadoop2.7
>  
> Env_variables : 
> SPARK_HOME : 
> C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7
> HADOOP_HOME : 
> C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7
> JAVA_HOME : C:\Program Files\Java\jdk1.8.0_191 ; C:\Program 
> Files\Java\jre1.8.0_241\bin
> PYTHON VERSION : Python 3.7.1
> ANACONDA VERSION : conda 4.8.2
>  
> I am not running this pyspark code locally (no hadoop setup)- to develop on 
> NLP script. 
>  
>Reporter: Rohith Bhattaram
>Priority: Major
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Issue : 
> I am trying to initialize spark context using below code : 
>  
> from pyspark.sql import SparkSession
>  spark = SparkSession.builder.getOrCreate()
>  import pandas as pd
>  sc = spark.sparkContext
>  
> At getOrCreate() step , code is going to infinite loop and never respond back 
>  either with exception or timeout. 
>  
> On execution if i  check on VS Terminal or Anaconda cmd prompot - below 
> statement is getting displayed : 
> [I 19:13:47.973 NotebookApp] Saving file at /LearnPython/Assignment3/NLP.ipynb
>  *The filename, directory name, or volume label syntax is incorrect.*
>  
>  
> Note : This used to work fine till a month back - not sure what chnaged, it 
> stopped working from yesterday. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-31152) Issue with Spark Context Initialization i.e. SparkSession.builder.getOrCreate()

2020-03-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-31152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31152.
--
Resolution: Invalid

> Issue with Spark Context Initialization  i.e. 
> SparkSession.builder.getOrCreate()
> 
>
> Key: SPARK-31152
> URL: https://issues.apache.org/jira/browse/SPARK-31152
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.5, 3.0.0
> Environment: OS : Windows 10 
> Spark : spark-3.0.0-preview2-bin-hadoop2.7
>  
> Env_variables : 
> SPARK_HOME : 
> C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7
> HADOOP_HOME : 
> C:\Users\rohit\spark\spark-3.0.0-preview2-bin-hadoop2.7\spark-3.0.0-preview2-bin-hadoop2.7
> JAVA_HOME : C:\Program Files\Java\jdk1.8.0_191 ; C:\Program 
> Files\Java\jre1.8.0_241\bin
> PYTHON VERSION : Python 3.7.1
> ANACONDA VERSION : conda 4.8.2
>  
> I am not running this pyspark code locally (no hadoop setup)- to develop on 
> NLP script. 
>  
>Reporter: Rohith Bhattaram
>Priority: Major
>   Original Estimate: 20h
>  Remaining Estimate: 20h
>
> Issue : 
> I am trying to initialize spark context using below code : 
>  
> from pyspark.sql import SparkSession
>  spark = SparkSession.builder.getOrCreate()
>  import pandas as pd
>  sc = spark.sparkContext
>  
> At getOrCreate() step , code is going to infinite loop and never respond back 
>  either with exception or timeout. 
>  
> On execution if i  check on VS Terminal or Anaconda cmd prompot - below 
> statement is getting displayed : 
> [I 19:13:47.973 NotebookApp] Saving file at /LearnPython/Assignment3/NLP.ipynb
>  *The filename, directory name, or volume label syntax is incorrect.*
>  
>  
> Note : This used to work fine till a month back - not sure what chnaged, it 
> stopped working from yesterday. 
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

40 matches

Mail list logo