date:20200207

[jira] [Updated] (SPARK-30651) EXPLAIN EXTENDED does not show detail information for aggregate operators

2020-02-07 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-30651:

Issue Type: Bug  (was: Improvement)

> EXPLAIN EXTENDED does not show detail information for aggregate operators
> -
>
> Key: SPARK-30651
> URL: https://issues.apache.org/jira/browse/SPARK-30651
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xin Wu
>Priority: Major
>
> Currently EXPLAIN FORMATTED only report input attributes of 
> HashAggregate/ObjectHashAggregate/SortAggregate. While EXPLAIN EXTENDED 
> provides more information. We need to enhance EXPLAIN FORMATTED to follow the 
> original behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF

2020-02-07 Thread Javier Fuentes (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032825#comment-17032825
 ] 

Javier Fuentes commented on SPARK-30688:


- When using week based dates the year pattern should be YY/
https://docs.oracle.com/javase/8/docs/api/java/time/format/DateTimeFormatterBuilder.html#appendPattern-java.lang.String-

- In JDK8 there is a BUG related to parsing ww without space or symbols: 
https://bugs.openjdk.java.net/browse/JDK-8145633
It can be reproduced with: DateTimeFormatter.ofPattern("ww").parse("201550")

- When using a format like -ww and text 2020-10 I obtained the following 
exception:
{code:java}
java.time.format.DateTimeParseException: Text '2020-10' could not be parsed: 
Conflict found: Field WeekOfWeekBasedYear[WeekFields[SUNDAY,1]] 1 differs from 
WeekOfWeekBasedYear[WeekFields[SUNDAY,1]] 10 derived from 2020-01-01
{code}
This is caused by the defaults values for temporal fields while building the 
DateTimeFormatter:
{code:java}
.parseDefaulting(ChronoField.ERA, 1)
.parseDefaulting(ChronoField.MONTH_OF_YEAR, 1)
.parseDefaulting(ChronoField.DAY_OF_MONTH, 1)
{code}
This is why ('2020-01', '-ww') it's working and ('2020-10', '-ww') is 
not.
The 10th week of the year 2020 corresponds to March(2020-03-02) Not January 
(2020-01-01)

A quick fix for this issue is to detect if a week based pattern has been 
provided and change the default temporal fields:
{code:java}
.parseDefaulting(ChronoField.ERA, 1)
.parseDefaulting(ChronoField.DAY_OF_WEEK, 1)  
{code}
I have raised a PR. Let me know what you think.

> Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF
> --
>
> Key: SPARK-30688
> URL: https://issues.apache.org/jira/browse/SPARK-30688
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.2
>Reporter: Rajkumar Singh
>Priority: Major
>
>  
> {code:java}
> scala> spark.sql("select unix_timestamp('20201', 'ww')").show();
> +-+
> |unix_timestamp(20201, ww)|
> +-+
> |                         null|
> +-+
>  
> scala> spark.sql("select unix_timestamp('20202', 'ww')").show();
> -+
> |unix_timestamp(20202, ww)|
> +-+
> |                   1578182400|
> +-+
>  
> {code}
>  
>  
> This seems to happen for leap year only, I dig deeper into it and it seems 
> that  Spark is using the java.text.SimpleDateFormat and try to parse the 
> expression here
> [org.apache.spark.sql.catalyst.expressions.UnixTime#eval|https://github.com/hortonworks/spark2/blob/49ec35bbb40ec6220282d932c9411773228725be/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala#L652]
> {code:java}
> formatter.parse(
>  t.asInstanceOf[UTF8String].toString).getTime / 1000L{code}
>  but fail and SimpleDateFormat unable to parse the date throw Unparseable 
> Exception but Spark handle it silently and returns NULL.
>  
> *Spark-3.0:* I did some tests where spark no longer using the legacy 
> java.text.SimpleDateFormat but java date/time API, it seems  date/time API 
> expect a valid date with valid format
>  org.apache.spark.sql.catalyst.util.Iso8601TimestampFormatter#parse



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

2020-02-07 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032821#comment-17032821
 ] 

Yuming Wang commented on SPARK-30756:
-

Hi [~shaneknapp] Do you have some special configuration? I can not reproduce it 
on my local machine.

> `ThriftServerWithSparkContextSuite` fails always on 
> spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3
> -
>
> Key: SPARK-30756
> URL: https://issues.apache.org/jira/browse/SPARK-30756
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> This is a release blocker for 3.0.0.
> The same test profile Hadoop-2.7/Hive-2.3 fails on `branch-3.0` while it 
> succeeds in `master`.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3/
> The failure comes from `ThriftServerWithSparkContextSuite`.
> {code}
> info] 
> org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite *** 
> ABORTED *** (40 seconds, 707 milliseconds)
> [info]   org.apache.hive.service.ServiceException: Failed to Start HiveServer2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-23710) Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

2020-02-07 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-23710:

Fix Version/s: 3.0.0

> Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
> -
>
> Key: SPARK-23710
> URL: https://issues.apache.org/jira/browse/SPARK-23710
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Critical
> Fix For: 3.0.0
>
>
> Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 
> 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more 
> details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an 
> umbrella JIRA to track this upgrade.
>  
> *Upgrade Plan*:
>  # SPARK-27054 Remove the Calcite dependency. This can avoid some jar 
> conflicts.
>  # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove 
> OrcProto.Type usage
>  # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles 
> when testing
>  # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and 
> compile passed on Hive 2.3.4
>  # Add an empty hive-thriftserverV2 module. then we could test all test cases 
> in next step
>  # Make Hadoop-3.1 with Hive 2.3.4 test passed
>  # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's 
> [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]
>  
> I have completed the [initial 
> work|https://github.com/apache/spark/pull/24044] and plan to finish this 
> upgrade step by step.
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23710) Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

2020-02-07 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-23710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-23710.
-
Resolution: Fixed

> Upgrade the built-in Hive to 2.3.5 for hadoop-3.2
> -
>
> Key: SPARK-23710
> URL: https://issues.apache.org/jira/browse/SPARK-23710
> Project: Spark
>  Issue Type: Umbrella
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Critical
>
> Spark fail to run on Hadoop 3.x, because Hive's ShimLoader considers Hadoop 
> 3.x to be an unknown Hadoop version. see SPARK-18673 and HIVE-16081 for more 
> details. So we need to upgrade the built-in Hive for Hadoop-3.x. This is an 
> umbrella JIRA to track this upgrade.
>  
> *Upgrade Plan*:
>  # SPARK-27054 Remove the Calcite dependency. This can avoid some jar 
> conflicts.
>  # SPARK-23749 Replace built-in Hive API (isSub/toKryo) and remove 
> OrcProto.Type usage
>  # SPARK-27158, SPARK-27130 Update dev/* to support dynamic change profiles 
> when testing
>  # Fix ORC dependency conflict to makes it test passed on Hive 1.2.1 and 
> compile passed on Hive 2.3.4
>  # Add an empty hive-thriftserverV2 module. then we could test all test cases 
> in next step
>  # Make Hadoop-3.1 with Hive 2.3.4 test passed
>  # Adapted hive-thriftserverV2 from hive-thriftserver with Hive 2.3.4's 
> [TCLIService.thrift|https://github.com/apache/hive/blob/rel/release-2.3.4/service-rpc/if/TCLIService.thrift]
>  
> I have completed the [initial 
> work|https://github.com/apache/spark/pull/24044] and plan to finish this 
> upgrade step by step.
>   
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30757) Update the doc on TableCatalog.alterTable's behavior

2020-02-07 Thread Terry Kim (Jira)

Terry Kim created SPARK-30757:
-

 Summary: Update the doc on TableCatalog.alterTable's behavior
 Key: SPARK-30757
 URL: https://issues.apache.org/jira/browse/SPARK-30757
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Terry Kim


The documentation on TableCatalog.alterTable doesn't mention which order the 
requested changes will be applied. It will be useful to explicitly document 
this behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-24884) Implement regexp_extract_all

2020-02-07 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reopened SPARK-24884:
-

> Implement regexp_extract_all
> 
>
> Key: SPARK-24884
> URL: https://issues.apache.org/jira/browse/SPARK-24884
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.3.1
>Reporter: Nick Nicolini
>Priority: Major
>
> I've recently hit many cases of regexp parsing where we need to match on 
> something that is always arbitrary in length; for example, a text block that 
> looks something like:
> {code:java}
> AAA:WORDS|
> BBB:TEXT|
> MSG:ASDF|
> MSG:QWER|
> ...
> MSG:ZXCV|{code}
> Where I need to pull out all values between "MSG:" and "|", which can occur 
> in each instance between 1 and n times. I cannot reliably use the existing 
> {{regexp_extract}} method since the number of occurrences is always 
> arbitrary, and while I can write a UDF to handle this it'd be great if this 
> was supported natively in Spark.
> Perhaps we can implement something like {{regexp_extract_all}} as 
> [Presto|https://prestodb.io/docs/current/functions/regexp.html] and 
> [Pig|https://pig.apache.org/docs/latest/api/org/apache/pig/builtin/REGEX_EXTRACT_ALL.html]
>  have?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

2020-02-07 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-30579:
---

Assignee: Dilip Biswal

> Document ORDER BY Clause of SELECT statement in SQL Reference.
> --
>
> Key: SPARK-30579
> URL: https://issues.apache.org/jira/browse/SPARK-30579
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Assignee: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

2020-02-07 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-30579.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> Document ORDER BY Clause of SELECT statement in SQL Reference.
> --
>
> Key: SPARK-30579
> URL: https://issues.apache.org/jira/browse/SPARK-30579
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL
>Affects Versions: 2.4.3
>Reporter: Dilip Biswal
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21918) HiveClient shouldn't share Hive object between different thread

2020-02-07 Thread Chenhao Wu (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032776#comment-17032776
 ] 

Chenhao Wu commented on SPARK-21918:


is this still in progress??

> HiveClient shouldn't share Hive object between different thread
> ---
>
> Key: SPARK-21918
> URL: https://issues.apache.org/jira/browse/SPARK-21918
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Hu Liu,
>Priority: Major
>
> I'm testing the spark thrift server and found that all the DDL statements are 
> run by user hive even if hive.server2.enable.doAs=true
> The root cause is that Hive object is shared between different thread in 
> HiveClientImpl
> {code:java}
>   private def client: Hive = {
> if (clientLoader.cachedHive != null) {
>   clientLoader.cachedHive.asInstanceOf[Hive]
> } else {
>   val c = Hive.get(conf)
>   clientLoader.cachedHive = c
>   c
> }
>   }
> {code}
> But in impersonation mode, we should just share the Hive object inside the 
> thread so that the  metastore client in Hive could be associated with right 
> user.
> we can  pass the Hive object of parent thread to child thread when running 
> the sql to fix it
> I have already had a initial patch for review and I'm glad to work on it if 
> anyone could assign it to me.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-30659) LogisticRegression blockify input vectors

2020-02-07 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reopened SPARK-30659:
--

need more tests on sparse datasets

> LogisticRegression blockify input vectors
> -
>
> Key: SPARK-30659
> URL: https://issues.apache.org/jira/browse/SPARK-30659
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-30660) LinearRegression blockify input vectors

2020-02-07 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reopened SPARK-30660:
--

need more tests on sparse datasets

> LinearRegression blockify input vectors
> ---
>
> Key: SPARK-30660
> URL: https://issues.apache.org/jira/browse/SPARK-30660
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-30642) LinearSVC blockify input vectors

2020-02-07 Thread zhengruifeng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reopened SPARK-30642:
--

need more tests on sparse datasets

> LinearSVC blockify input vectors
> 
>
> Key: SPARK-30642
> URL: https://issues.apache.org/jira/browse/SPARK-30642
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

2020-02-07 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-30756:
-

 Summary: `ThriftServerWithSparkContextSuite` fails always on 
spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3
 Key: SPARK-30756
 URL: https://issues.apache.org/jira/browse/SPARK-30756
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


This is a release blocker for 3.0.0.

The same test profile Hadoop-2.7/Hive-2.3 fails on `branch-3.0` while it 
succeeds in `master`.
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/
- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3/

The failure comes from `ThriftServerWithSparkContextSuite`.
{code}
info] org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite 
*** ABORTED *** (40 seconds, 707 milliseconds)
[info]   org.apache.hive.service.ServiceException: Failed to Start HiveServer2
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

2020-02-07 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-30756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032744#comment-17032744
 ] 

Dongjoon Hyun commented on SPARK-30756:
---

Hi, [~yumwang]. Could you take a look please?

> `ThriftServerWithSparkContextSuite` fails always on 
> spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3
> -
>
> Key: SPARK-30756
> URL: https://issues.apache.org/jira/browse/SPARK-30756
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> This is a release blocker for 3.0.0.
> The same test profile Hadoop-2.7/Hive-2.3 fails on `branch-3.0` while it 
> succeeds in `master`.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7-hive-2.3/
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3/
> The failure comes from `ThriftServerWithSparkContextSuite`.
> {code}
> info] 
> org.apache.spark.sql.hive.thriftserver.ThriftServerWithSparkContextSuite *** 
> ABORTED *** (40 seconds, 707 milliseconds)
> [info]   org.apache.hive.service.ServiceException: Failed to Start HiveServer2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2020-02-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27298.
---
   Fix Version/s: 2.4.4
Target Version/s:   (was: 3.0.0)
  Resolution: Fixed

Remove `Target Version: 3.0.0` and set `Fix Version: 2.4.4`.

> Dataset except operation gives different results(dataset count) on Spark 
> 2.3.0 Windows and Spark 2.3.0 Linux environment
> 
>
> Key: SPARK-27298
> URL: https://issues.apache.org/jira/browse/SPARK-27298
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.2
>Reporter: Mahima Khatri
>Priority: Blocker
>  Labels: data-loss
> Fix For: 2.4.4
>
> Attachments: Console-Result-Windows.txt, 
> Linux-spark-2.3.0_result.txt, Linux-spark-2.4.4_result.txt, 
> console-reslt-2.3.3-linux.txt, console-result-2.3.3-windows.txt, 
> console-result-LinuxonVM.txt, console-result-spark-2.4.2-linux, 
> console-result-spark-2.4.2-windows, customer.csv, pom.xml
>
>
> {code:java}
> // package com.verifyfilter.example;
> import org.apache.spark.SparkConf;
> import org.apache.spark.SparkContext;
> import org.apache.spark.sql.SparkSession;
> import org.apache.spark.sql.Column;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SaveMode;
> public class ExcludeInTesting {
> public static void main(String[] args) {
> SparkSession spark = SparkSession.builder()
> .appName("ExcludeInTesting")
> .config("spark.some.config.option", "some-value")
> .getOrCreate();
> Dataset dataReadFromCSV = spark.read().format("com.databricks.spark.csv")
> .option("header", "true")
> .option("delimiter", "|")
> .option("inferSchema", "true")
> //.load("E:/resources/customer.csv"); local //below path for VM
> .load("/home/myproject/bda/home/bin/customer.csv");
> dataReadFromCSV.printSchema();
> dataReadFromCSV.show();
> //Adding an extra step of saving to db and then loading it again
> dataReadFromCSV.write().mode(SaveMode.Overwrite).saveAsTable("customer");
> Dataset dataLoaded = spark.sql("select * from customer");
> //Gender EQ M
> Column genderCol = dataLoaded.col("Gender");
> Dataset onlyMaleDS = dataLoaded.where(genderCol.equalTo("M"));
> //Dataset onlyMaleDS = spark.sql("select count(*) from customer where 
> Gender='M'");
> onlyMaleDS.show();
> System.out.println("The count of Male customers is :"+ onlyMaleDS.count());
> System.out.println("*");
> // Income in the list
> Object[] valuesArray = new Object[5];
> valuesArray[0]=503.65;
> valuesArray[1]=495.54;
> valuesArray[2]=486.82;
> valuesArray[3]=481.28;
> valuesArray[4]=479.79;
> Column incomeCol = dataLoaded.col("Income");
> Dataset incomeMatchingSet = dataLoaded.where(incomeCol.isin((Object[]) 
> valuesArray));
> System.out.println("The count of customers satisfaying Income is :"+ 
> incomeMatchingSet.count());
> System.out.println("*");
> Dataset maleExcptIncomeMatch = onlyMaleDS.except(incomeMatchingSet);
> System.out.println("The count of final customers is :"+ 
> maleExcptIncomeMatch.count());
> System.out.println("*");
> }
> }
> {code}
>  When the above code is executed on Spark 2.3.0 ,it gives below different 
> results:
> *Windows* :  The code gives correct count of dataset 148237,
> *Linux :*         The code gives different {color:#172b4d}count of dataset 
> 129532 {color}
>  
> {color:#172b4d}Some more info related to this bug:{color}
> {color:#172b4d}1. Application Code (attached)
> 2. CSV file used(attached)
> 3. Windows spec 
>           Windows 10- 64 bit OS 
> 4. Linux spec (Running on Oracle VM virtual box)
>       Specifications: \{as captured from Vbox.log}
>         00:00:26.112908 VMMDev: Guest Additions information report: Version 
> 5.0.32 r112930          '5.0.32_Ubuntu'
>         00:00:26.112996 VMMDev: Guest Additions information report: Interface 
> = 0x00010004         osType = 0x00053100 (Linux >= 2.6, 64-bit)
> 5. Snapshots of output in both cases (attached){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

2020-02-07 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032740#comment-17032740
 ] 

Dongjoon Hyun commented on SPARK-27298:
---

[~ksunitha]. SPARK-26366 is backported to 2.4.1 and the attached 
`console-result-spark-2.4.2-windows` shows a wrong result on windows. I'm not 
sure about that SPARK-26366 is related to this. BTW, since [~Mahima] verified 
that this is fixed at 2.4.4, I'll update the JIRA according to that 
information. We can add more information later when we find out what JIRA 
exactly fixed this.

> Dataset except operation gives different results(dataset count) on Spark 
> 2.3.0 Windows and Spark 2.3.0 Linux environment
> 
>
> Key: SPARK-27298
> URL: https://issues.apache.org/jira/browse/SPARK-27298
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0, 2.4.2
>Reporter: Mahima Khatri
>Priority: Blocker
>  Labels: data-loss
> Attachments: Console-Result-Windows.txt, 
> Linux-spark-2.3.0_result.txt, Linux-spark-2.4.4_result.txt, 
> console-reslt-2.3.3-linux.txt, console-result-2.3.3-windows.txt, 
> console-result-LinuxonVM.txt, console-result-spark-2.4.2-linux, 
> console-result-spark-2.4.2-windows, customer.csv, pom.xml
>
>
> {code:java}
> // package com.verifyfilter.example;
> import org.apache.spark.SparkConf;
> import org.apache.spark.SparkContext;
> import org.apache.spark.sql.SparkSession;
> import org.apache.spark.sql.Column;
> import org.apache.spark.sql.Dataset;
> import org.apache.spark.sql.Row;
> import org.apache.spark.sql.SaveMode;
> public class ExcludeInTesting {
> public static void main(String[] args) {
> SparkSession spark = SparkSession.builder()
> .appName("ExcludeInTesting")
> .config("spark.some.config.option", "some-value")
> .getOrCreate();
> Dataset dataReadFromCSV = spark.read().format("com.databricks.spark.csv")
> .option("header", "true")
> .option("delimiter", "|")
> .option("inferSchema", "true")
> //.load("E:/resources/customer.csv"); local //below path for VM
> .load("/home/myproject/bda/home/bin/customer.csv");
> dataReadFromCSV.printSchema();
> dataReadFromCSV.show();
> //Adding an extra step of saving to db and then loading it again
> dataReadFromCSV.write().mode(SaveMode.Overwrite).saveAsTable("customer");
> Dataset dataLoaded = spark.sql("select * from customer");
> //Gender EQ M
> Column genderCol = dataLoaded.col("Gender");
> Dataset onlyMaleDS = dataLoaded.where(genderCol.equalTo("M"));
> //Dataset onlyMaleDS = spark.sql("select count(*) from customer where 
> Gender='M'");
> onlyMaleDS.show();
> System.out.println("The count of Male customers is :"+ onlyMaleDS.count());
> System.out.println("*");
> // Income in the list
> Object[] valuesArray = new Object[5];
> valuesArray[0]=503.65;
> valuesArray[1]=495.54;
> valuesArray[2]=486.82;
> valuesArray[3]=481.28;
> valuesArray[4]=479.79;
> Column incomeCol = dataLoaded.col("Income");
> Dataset incomeMatchingSet = dataLoaded.where(incomeCol.isin((Object[]) 
> valuesArray));
> System.out.println("The count of customers satisfaying Income is :"+ 
> incomeMatchingSet.count());
> System.out.println("*");
> Dataset maleExcptIncomeMatch = onlyMaleDS.except(incomeMatchingSet);
> System.out.println("The count of final customers is :"+ 
> maleExcptIncomeMatch.count());
> System.out.println("*");
> }
> }
> {code}
>  When the above code is executed on Spark 2.3.0 ,it gives below different 
> results:
> *Windows* :  The code gives correct count of dataset 148237,
> *Linux :*         The code gives different {color:#172b4d}count of dataset 
> 129532 {color}
>  
> {color:#172b4d}Some more info related to this bug:{color}
> {color:#172b4d}1. Application Code (attached)
> 2. CSV file used(attached)
> 3. Windows spec 
>           Windows 10- 64 bit OS 
> 4. Linux spec (Running on Oracle VM virtual box)
>       Specifications: \{as captured from Vbox.log}
>         00:00:26.112908 VMMDev: Guest Additions information report: Version 
> 5.0.32 r112930          '5.0.32_Ubuntu'
>         00:00:26.112996 VMMDev: Guest Additions information report: Interface 
> = 0x00010004         osType = 0x00053100 (Linux >= 2.6, 64-bit)
> 5. Snapshots of output in both cases (attached){color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30289) Partitioned by Nested Column for `InMemoryTable`

2020-02-07 Thread DB Tsai (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DB Tsai updated SPARK-30289:

Summary: Partitioned by Nested Column for `InMemoryTable`  (was: DSv2 
partitioning should not accept nested columns)

> Partitioned by Nested Column for `InMemoryTable`
> 
>
> Key: SPARK-30289
> URL: https://issues.apache.org/jira/browse/SPARK-30289
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13

2020-02-07 Thread Guillaume Martres (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-29292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17032671#comment-17032671
 ] 

Guillaume Martres commented on SPARK-29292:
---

> I can try updating my fork and pushing the commits to a branch, for 
> evaluation in _2.12_ at least. 

I'd be interested in seeing this branch too.

>  I just wasn't bothering until it seemed like 2.13 support blockers were 
> removed

What are the other blockers for 2.13 support ?

> Fix internal usages of mutable collection for Seq in 2.13
> -
>
> Key: SPARK-29292
> URL: https://issues.apache.org/jira/browse/SPARK-29292
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Sean R. Owen
>Assignee: Sean R. Owen
>Priority: Minor
>
> Kind of related to https://issues.apache.org/jira/browse/SPARK-27681, but a 
> simpler subset. 
> In 2.13, a mutable collection can't be returned as a 
> {{scala.collection.Seq}}. It's easy enough to call .toSeq on these as that 
> still works on 2.12.
> {code}
> [ERROR] [Error] 
> /Users/seanowen/Documents/spark_2.13/core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala:467:
>  type mismatch;
>  found   : Seq[String] (in scala.collection) 
>  required: Seq[String] (in scala.collection.immutable) 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30753) Remove unnecessary changes in SPARK-30684

2020-02-07 Thread Gengliang Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30753?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-30753.

Resolution: Duplicate

Close this one and make https://github.com/apache/spark/pull/27490 as a 
follow-up

> Remove unnecessary changes in SPARK-30684 
> --
>
> Key: SPARK-30753
> URL: https://issues.apache.org/jira/browse/SPARK-30753
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Minor
>
> In https://github.com/apache/spark/pull/27405 , the UI changes can be made 
> without touching the `SparkPlanGraphNode`. Also, it seems that we don't need 
> the function `adjustPositionOfOperationName` to adjust the position of 
> operation name and mark as a operation-name class.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-30752) Wrong result of to_utc_timestamp() on daylight saving day

2020-02-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30752.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 27474
[https://github.com/apache/spark/pull/27474]

> Wrong result of to_utc_timestamp() on daylight saving day
> -
>
> Key: SPARK-30752
> URL: https://issues.apache.org/jira/browse/SPARK-30752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> The to_utc_timestamp() function returns wrong result when:
> * JVM system time zone is PST
> * the session local time zone is UTC
> * fromZone is Asia/Hong_Kong
> for the local date '2019-11-03T12:00:00', the result must be 
> '2019-11-03T04:00:00'
> {code}
> scala> import java.util.TimeZone
> import java.util.TimeZone
> scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> TimeZone.setDefault(getTimeZone("PST"))
> scala> spark.conf.set("spark.sql.session.timeZone", "UTC")
> scala> val df = Seq("2019-11-03T12:00:00").toDF("localTs")
> df: org.apache.spark.sql.DataFrame = [localTs: string]
> scala> df.select(to_utc_timestamp(col("localTs"), "Asia/Hong_Kong")).show
> +-+
> |to_utc_timestamp(localTs, Asia/Hong_Kong)|
> +-+
> |  2019-11-03 03:00:00|
> +-+
> {code}
>  
> See 
> https://www.worldtimebuddy.com/?qm=1=8,1819729,100=8=2019-11-2=21-22



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30752) Wrong result of to_utc_timestamp() on daylight saving day

2020-02-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30752:
---

Assignee: Maxim Gekk

> Wrong result of to_utc_timestamp() on daylight saving day
> -
>
> Key: SPARK-30752
> URL: https://issues.apache.org/jira/browse/SPARK-30752
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4, 3.0.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> The to_utc_timestamp() function returns wrong result when:
> * JVM system time zone is PST
> * the session local time zone is UTC
> * fromZone is Asia/Hong_Kong
> for the local date '2019-11-03T12:00:00', the result must be 
> '2019-11-03T04:00:00'
> {code}
> scala> import java.util.TimeZone
> import java.util.TimeZone
> scala> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
> import org.apache.spark.sql.catalyst.util.DateTimeUtils._
> scala> import org.apache.spark.sql.functions._
> import org.apache.spark.sql.functions._
> scala> TimeZone.setDefault(getTimeZone("PST"))
> scala> spark.conf.set("spark.sql.session.timeZone", "UTC")
> scala> val df = Seq("2019-11-03T12:00:00").toDF("localTs")
> df: org.apache.spark.sql.DataFrame = [localTs: string]
> scala> df.select(to_utc_timestamp(col("localTs"), "Asia/Hong_Kong")).show
> +-+
> |to_utc_timestamp(localTs, Asia/Hong_Kong)|
> +-+
> |  2019-11-03 03:00:00|
> +-+
> {code}
>  
> See 
> https://www.worldtimebuddy.com/?qm=1=8,1819729,100=8=2019-11-2=21-22



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-30747) Update roxygen2 to 7.0.1

2020-02-07 Thread Shane Knapp (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-30747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shane Knapp reassigned SPARK-30747:
---

Assignee: Shane Knapp

> Update roxygen2 to 7.0.1
> 
>
> Key: SPARK-30747
> URL: https://issues.apache.org/jira/browse/SPARK-30747
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR, Tests
>Affects Versions: 3.0.0
>Reporter: Maciej Szymkiewicz
>Assignee: Shane Knapp
>Priority: Minor
>
> Currently Spark uses {{roxygen2}} 5.0.1. It is already pretty old 
> (2015-11-11) so it could be a good idea to use current R updates to update it 
> as well.
> At crude inspection:
> * SPARK-22430 has been resolved a while ago.
> * SPARK-30737][SPARK-27262,  https://github.com/apache/spark/pull/27437 and 
> https://github.com/apache/spark/commit/b95ccb1d8b726b11435789cdb5882df6643430ed
>  resolved persisting warnings
> * Documentation builds and CRAN checks pass
> * Generated HTML docs are identical to 5.0.1
> Since {{roxygen2}} shares some potentially unstable dependencies with 
> {{devtools}} (primarily {{rlang}}) it might be a good idea to keep these in 
> sync (as a bonus we wouldn't have to worry about {{DESCRIPTION}} being 
> overwritten by local tests).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

2020-02-07 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-30755:
---

 Summary: Support Hive 1.2.1's Serde after making built-in Hive to 
2.3
 Key: SPARK-30755
 URL: https://issues.apache.org/jira/browse/SPARK-30755
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang



{noformat}
2020-01-27 05:11:20.446 - stderr> 20/01/27 05:11:20 INFO DAGScheduler: 
ResultStage 2 (main at NativeMethodAccessorImpl.java:0) failed in 1.000 s due 
to Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most 
recent failure: Lost task 0.3 in stage 2.0 (TID 13, 10.110.21.210, executor 1): 
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/serde2/SerDe
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass1(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.defineClass(ClassLoader.java:756)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.access$100(URLClassLoader.java:74)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:369)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader$1.run(URLClassLoader.java:363)
  2020-01-27 05:11:20.446 - stderr>  at 
java.security.AccessController.doPrivileged(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:362)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.446 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:405)
  2020-01-27 05:11:20.446 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName0(Native Method)
  2020-01-27 05:11:20.446 - stderr>  at java.lang.Class.forName(Class.java:348)
  2020-01-27 05:11:20.446 - stderr>  at 
org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializerClass(TableDesc.java:76)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveOutputWriter.(HiveFileFormat.scala:119)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.hive.execution.HiveFileFormat$$anon$1.newInstance(HiveFileFormat.scala:104)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.(FileFormatDataWriter.scala:111)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:267)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:208)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.doRunTask(Task.scala:144)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.scheduler.Task.run(Task.scala:117)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$6(Executor.scala:567)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1559)
  2020-01-27 05:11:20.447 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:570)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  2020-01-27 05:11:20.447 - stderr>  at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  2020-01-27 05:11:20.447 - stderr>  at java.lang.Thread.run(Thread.java:748)
  2020-01-27 05:11:20.447 - stderr> Caused by: 
java.lang.ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDe
  2020-01-27 05:11:20.447 - stderr>  at 
java.net.URLClassLoader.findClass(URLClassLoader.java:382)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  2020-01-27 05:11:20.447 - stderr>  at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
  2020-01-27 05:11:20.447 - stderr>  at 
java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  2020-01-27 05:11:20.447 - stderr>  ... 31 more
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail:

[jira] [Created] (SPARK-30754) Reuse results of floorDiv in calculations of floorMod in DateTimeUtils

2020-02-07 Thread Maxim Gekk (Jira)

Maxim Gekk created SPARK-30754:
--

 Summary: Reuse results of floorDiv in calculations of floorMod in 
DateTimeUtils
 Key: SPARK-30754
 URL: https://issues.apache.org/jira/browse/SPARK-30754
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Maxim Gekk


A couple methods in DateTimeUtils call Math.floorDiv and Math.floorMod with the 
same arguments. In this way, results of Math.floorDiv can be reused in 
calculation of Math.floorMod. For example, this optimization can be applied to 
the microsToInstant and truncDate.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-30753) Remove unnecessary changes in SPARK-30684

2020-02-07 Thread Gengliang Wang (Jira)

Gengliang Wang created SPARK-30753:
--

 Summary: Remove unnecessary changes in SPARK-30684 
 Key: SPARK-30753
 URL: https://issues.apache.org/jira/browse/SPARK-30753
 Project: Spark
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 3.0.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In https://github.com/apache/spark/pull/27405 , the UI changes can be made 
without touching the `SparkPlanGraphNode`. Also, it seems that we don't need 
the function `adjustPositionOfOperationName` to adjust the position of 
operation name and mark as a operation-name class.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-30651) EXPLAIN EXTENDED does not show detail information for aggregate operators

[jira] [Commented] (SPARK-30688) Spark SQL Unix Timestamp produces incorrect result with unix_timestamp UDF

[jira] [Commented] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

[jira] [Updated] (SPARK-23710) Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

[jira] [Resolved] (SPARK-23710) Upgrade the built-in Hive to 2.3.5 for hadoop-3.2

[jira] [Created] (SPARK-30757) Update the doc on TableCatalog.alterTable's behavior

[jira] [Reopened] (SPARK-24884) Implement regexp_extract_all

[jira] [Assigned] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

[jira] [Resolved] (SPARK-30579) Document ORDER BY Clause of SELECT statement in SQL Reference.

[jira] [Commented] (SPARK-21918) HiveClient shouldn't share Hive object between different thread

[jira] [Reopened] (SPARK-30659) LogisticRegression blockify input vectors

[jira] [Reopened] (SPARK-30660) LinearRegression blockify input vectors

[jira] [Reopened] (SPARK-30642) LinearSVC blockify input vectors

[jira] [Created] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

[jira] [Commented] (SPARK-30756) `ThriftServerWithSparkContextSuite` fails always on spark-branch-3.0-test-sbt-hadoop-2.7-hive-2.3

[jira] [Resolved] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

[jira] [Commented] (SPARK-27298) Dataset except operation gives different results(dataset count) on Spark 2.3.0 Windows and Spark 2.3.0 Linux environment

[jira] [Updated] (SPARK-30289) Partitioned by Nested Column for `InMemoryTable`

[jira] [Commented] (SPARK-29292) Fix internal usages of mutable collection for Seq in 2.13

[jira] [Resolved] (SPARK-30753) Remove unnecessary changes in SPARK-30684

[jira] [Resolved] (SPARK-30752) Wrong result of to_utc_timestamp() on daylight saving day

[jira] [Assigned] (SPARK-30752) Wrong result of to_utc_timestamp() on daylight saving day

[jira] [Assigned] (SPARK-30747) Update roxygen2 to 7.0.1

[jira] [Created] (SPARK-30755) Support Hive 1.2.1's Serde after making built-in Hive to 2.3

[jira] [Created] (SPARK-30754) Reuse results of floorDiv in calculations of floorMod in DateTimeUtils

[jira] [Created] (SPARK-30753) Remove unnecessary changes in SPARK-30684

26 matches

Site Navigation

Mail list logo

Footer information