[jira] [Commented] (SPARK-30077) create TEMPORARY VIEW USING should look up catalog/table like v2 commands

2019-11-29 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984803#comment-16984803
 ] 

Huaxin Gao commented on SPARK-30077:


I will work on this

> create TEMPORARY VIEW USING should look up catalog/table like v2 commands
> -
>
> Key: SPARK-30077
> URL: https://issues.apache.org/jira/browse/SPARK-30077
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Major
>
> create TEMPORARY VIEW USING should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30077) create TEMPORARY VIEW USING should look up catalog/table like v2 commands

2019-11-29 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-30077:
--

 Summary: create TEMPORARY VIEW USING should look up catalog/table 
like v2 commands
 Key: SPARK-30077
 URL: https://issues.apache.org/jira/browse/SPARK-30077
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Huaxin Gao


create TEMPORARY VIEW USING should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30069) Clean up non-shuffle disk block manager files following executor exists on YARN

2019-11-29 Thread Lantao Jin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lantao Jin updated SPARK-30069:
---
Component/s: Spark Core

> Clean up non-shuffle disk block manager files following executor exists on 
> YARN
> ---
>
> Key: SPARK-30069
> URL: https://issues.apache.org/jira/browse/SPARK-30069
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, YARN
>Affects Versions: 3.0.0
>Reporter: Lantao Jin
>Priority: Major
>
> Currently we only clean up the local directories on application removed. 
> However, when executors die and restart repeatedly, many temp files are left 
> untouched in the local directories, which is undesired behavior and could 
> cause disk space used up gradually.
> SPARK-24340 had fixed this problem in standalone mode. But in YARN mode, this 
> issue still exists. Especially, in long running service like Spark 
> thrift-server with dynamic resource allocation disabled, it's very easy 
> causes local disk full.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30078) flatMapGroupsWithState failure

2019-11-29 Thread salamani (Jira)
salamani created SPARK-30078:


 Summary: flatMapGroupsWithState failure
 Key: SPARK-30078
 URL: https://issues.apache.org/jira/browse/SPARK-30078
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.4.4
Reporter: salamani


I have built Apache Spark v2.4.4 on Big Endian Platform with AdoptJDK OpenJ9 
1.8.0_202.

My build is successful. However while running the scala tests of "Spark Project 
SQL" module, I am facing failures at with FlatMapGroupsWithStateSuite, Error 
Log as attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30078) flatMapGroupsWithState failure

2019-11-29 Thread salamani (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

salamani updated SPARK-30078:
-
Attachment: FlatMapGroupsWithStateSuite.txt

> flatMapGroupsWithState failure
> --
>
> Key: SPARK-30078
> URL: https://issues.apache.org/jira/browse/SPARK-30078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 2.4.4
>Reporter: salamani
>Priority: Major
>  Labels: big-endian
> Attachments: FlatMapGroupsWithStateSuite.txt
>
>
> I have built Apache Spark v2.4.4 on Big Endian Platform with AdoptJDK OpenJ9 
> 1.8.0_202.
> My build is successful. However while running the scala tests of "Spark 
> Project SQL" module, I am facing failures at with 
> FlatMapGroupsWithStateSuite, Error Log as attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30079) Tests fail in environments with locale different from en_US

2019-11-29 Thread Lukas Menzel (Jira)
Lukas Menzel created SPARK-30079:


 Summary: Tests fail in environments with locale different from 
en_US
 Key: SPARK-30079
 URL: https://issues.apache.org/jira/browse/SPARK-30079
 Project: Spark
  Issue Type: Bug
  Components: Build, Tests
Affects Versions: 3.0.0
 Environment: any environment, with non-english locale and/or different 
separators for numbers.
Reporter: Lukas Menzel


Tests fail on systems with different locale than en_US.

Assertions regarding messages of exceptions fail, because they are localized by 
Java depending on the system environment. (e.g 
org.apache.spark.deploy.SparkSubmitSuite)

Other tests fail because of assertions about formatted numbers, which use a 
different separators (see 
[https://docs.oracle.com/cd/E19455-01/806-0169/overview-9/index.html])

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30080) ADD/LIST Resources should look up catalog/table like v2 commands

2019-11-29 Thread Aman Omer (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984901#comment-16984901
 ] 

Aman Omer commented on SPARK-30080:
---

I will work on this

> ADD/LIST Resources should look up catalog/table like v2 commands
> 
>
> Key: SPARK-30080
> URL: https://issues.apache.org/jira/browse/SPARK-30080
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Aman Omer
>Priority: Major
>
> ADD/LIST Resources should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30080) ADD/LIST Resources should look up catalog/table like v2 commands

2019-11-29 Thread Aman Omer (Jira)
Aman Omer created SPARK-30080:
-

 Summary: ADD/LIST Resources should look up catalog/table like v2 
commands
 Key: SPARK-30080
 URL: https://issues.apache.org/jira/browse/SPARK-30080
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Aman Omer


ADD/LIST Resources should look up catalog/table like v2 commands



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30081) StreamingAggregationSuite failure on zLinux(big endian)

2019-11-29 Thread Dev Leishangthem (Jira)
Dev Leishangthem created SPARK-30081:


 Summary: StreamingAggregationSuite failure on zLinux(big endian)
 Key: SPARK-30081
 URL: https://issues.apache.org/jira/browse/SPARK-30081
 Project: Spark
  Issue Type: Bug
  Components: SQL, Tests
Affects Versions: 2.4.4
Reporter: Dev Leishangthem


The tests in 3 instance, the first two is at

*[info] - SPARK-23004: Ensure that TypedImperativeAggregate functions do not 
throw errors - state format version 1 *** FAILED *** (760 milliseconds)*
*[info] Assert on query failed: : Query [id = 
065b66ad-227a-46a4-9d9d-50d27672f02a, runId = 
99c001b7-45df-4977-89b6-f68970378f4b] terminated with exception: Job aborted 
due to stage failure: Task 0 in stage 192.0 failed 1 times, most recent 
failure: Lost task 0.0 in stage 192.0 (TID 518, localhost, executor driver): 
java.lang.AssertionError: sizeInBytes (76) should be a multiple of 8*
*[info] at 
org.apache.spark.sql.catalyst.expressions.UnsafeRow.pointTo(UnsafeRow.java:168)*
*[info] at 
org.apache.spark.sql.execution.UnsafeKVExternalSorter$KVSorterIterator.next(UnsafeKVExternalSorter.java:297)*
*[info] at 
org.apache.spark.sql.execution.aggregate.SortBasedAggregator$$anon$1.(ObjectAggregationIterator.scala:242)*
*[info] at 
org.apache.spark.sql.execution.aggregate.SortBasedAggregator.destructiveIterator(ObjectAggregationIterator.scala:239)*
*[info] at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.processInputs(ObjectAggregationIterator.scala:198)*
*[info] at 
org.apache.spark.sql.execution.aggregate.ObjectAggregationIterator.(ObjectAggregationIterator.scala:78)*
*[info] at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:114)*
*[info] at 
org.apache.spark.sql.execution.aggregate.ObjectHashAggregateExec$$anonfun$doExecute$1$$anonfun$2.apply(ObjectHashAggregateExec.scala:105)*
*[info] at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)*
*[info] at 
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndexInternal$1$$anonfun$12.apply(RDD.scala:823)*
*[info] at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)*
*[info] at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)*

 

 

and third one is 

*[info] - simple count, update mode - recovery from checkpoint uses state 
format version 1 *** FAILED *** (1 second, 21 milliseconds)*
*[info] == Results ==*
*[info] !== Correct Answer - 3 == == Spark Answer - 3 ==*
*[info] !struct<_1:int,_2:int> struct*
*[info] [1,1] [1,1]*
*[info] ![2,2] [2,1]*
*[info] ![3,3] [3,1]*
*[info]*
*[info]*
*[info] == Progress ==*
*[info] 
StartStream(ProcessingTime(0),org.apache.spark.util.SystemClock@f12c12fb,Map(spark.sql.streaming.aggregation.stateFormatVersion
 -> 
2),/scratch/devleish/spark/target/tmp/spark-5a533a9c-da17-41f9-a7d4-c3309d1c2b6f)*
*[info] AddData to MemoryStream[value#1713]: 3,2,1*
*[info] => CheckLastBatch: [3,3],[2,2],[1,1]*
*[info] AssertOnQuery(, name)*
*[info] AddData to MemoryStream[value#1713]: 4,4,4,4*
*[info] CheckLastBatch: [4,4]*

This 
*[https://github.com/apache/spark/commit/ebbe589d12434bc108672268bee05a7b7e571ee6]
 e*nsures that value is multiple of 8, but looks like it is not the case Big 
Endian platform



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)
John Ayad created SPARK-30082:
-

 Summary: Zeros are being treated as NaNs
 Key: SPARK-30082
 URL: https://issues.apache.org/jira/browse/SPARK-30082
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.4.4
Reporter: John Ayad


If you attempt to run
{code}
df = df.replace(float('nan'), somethingToReplaceWith)
{code}
It will replace all {{0}}s in columns of type {{Integer}}

Example code snippet to repro this:
{code}
from pyspark.sql import SQLContext
spark = SQLContext(sc).sparkSession
df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
df.show()
df = df.replace(float('nan'), 5)
df.show()
{code}

Here's the output I get when I run this code:
{code}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
  /_/

Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> spark = SQLContext(sc).sparkSession
>>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|0|
|2|3|
|3|0|
+-+-+

>>> df = df.replace(float('nan'), 5)
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|5|
|2|3|
|3|5|
+-+-+

>>>
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Ayad updated SPARK-30082:
--
Description: 
If you attempt to run
{code:java}
df = df.replace(float('nan'), somethingToReplaceWith)
{code}
It will replace all {{0}} s in columns of type {{Integer}}

Example code snippet to repro this:
{code:java}
from pyspark.sql import SQLContext
spark = SQLContext(sc).sparkSession
df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
df.show()
df = df.replace(float('nan'), 5)
df.show()
{code}
Here's the output I get when I run this code:
{code:java}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
  /_/

Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> spark = SQLContext(sc).sparkSession
>>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|0|
|2|3|
|3|0|
+-+-+

>>> df = df.replace(float('nan'), 5)
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|5|
|2|3|
|3|5|
+-+-+

>>>
{code}

  was:
If you attempt to run
{code}
df = df.replace(float('nan'), somethingToReplaceWith)
{code}
It will replace all {{0}}s in columns of type {{Integer}}

Example code snippet to repro this:
{code}
from pyspark.sql import SQLContext
spark = SQLContext(sc).sparkSession
df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
df.show()
df = df.replace(float('nan'), 5)
df.show()
{code}

Here's the output I get when I run this code:
{code}
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.4
  /_/

Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
SparkSession available as 'spark'.
>>> from pyspark.sql import SQLContext
>>> spark = SQLContext(sc).sparkSession
>>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|0|
|2|3|
|3|0|
+-+-+

>>> df = df.replace(float('nan'), 5)
>>> df.show()
+-+-+
|index|value|
+-+-+
|1|5|
|2|3|
|3|5|
+-+-+

>>>
{code}


> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Priority: Major
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Ayad updated SPARK-30082:
--
Priority: Critical  (was: Major)

> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Priority: Critical
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server

2019-11-29 Thread Ajith S (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985021#comment-16985021
 ] 

Ajith S commented on SPARK-27719:
-

Currently our production also encounter this issue, I would like to work on 
this, as per suggestion of [~hao.li] is the idea acceptable [~dongjoon] .?

> Set maxDisplayLogSize for spark history server
> --
>
> Key: SPARK-27719
> URL: https://issues.apache.org/jira/browse/SPARK-27719
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: hao.li
>Priority: Minor
>
> Sometimes a very large eventllog may be useless, and parses it may waste many 
> resources.
> It may be useful to  avoid parse large enventlogs by setting a configuration 
> spark.history.fs.maxDisplayLogSize.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118
 ] 

John Ayad commented on SPARK-30082:
---

Just thought i'd update on this, the {{replace}} function seems to be, 
correctly, replacing {{NaN}}s. Here's a better example that also demonstrates 
that the problem is limited to columns of type {{Integer}}:
{code:java}
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
>>> ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|  1.0|0|
|  0.0|3|
|  NaN|0|
+-+-+>>> df.replace(float('nan'), 2).show()
+-+-+
|index|value|
+-+-+
|  1.0|2|
|  0.0|3|
|  2.0|2|
+-+-+ {code}

> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Priority: Critical
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118
 ] 

John Ayad edited comment on SPARK-30082 at 11/29/19 5:07 PM:
-

Just thought i'd update on this, the {{replace}} function seems to be, 
correctly, replacing {{NaN}}s. Here's a better example that also demonstrates 
that the problem is limited to columns of type Integer}}:
{code:java}
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
>>> ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|  1.0|0|
|  0.0|3|
|  NaN|0|
+-+-+>>> df.replace(float('nan'), 2).show()
+-+-+
|index|value|
+-+-+
|  1.0|2|
|  0.0|3|
|  2.0|2|
+-+-+ {code}


was (Author: jayad):
Just thought i'd update on this, the {{replace}} function seems to be, 
correctly, replacing {{NaN}}s. Here's a better example that also demonstrates 
that the problem is limited to columns of type {{Integer}}:
{code:java}
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
>>> ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|  1.0|0|
|  0.0|3|
|  NaN|0|
+-+-+>>> df.replace(float('nan'), 2).show()
+-+-+
|index|value|
+-+-+
|  1.0|2|
|  0.0|3|
|  2.0|2|
+-+-+ {code}

> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Priority: Critical
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-30082) Zeros are being treated as NaNs

2019-11-29 Thread John Ayad (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985118#comment-16985118
 ] 

John Ayad edited comment on SPARK-30082 at 11/29/19 5:08 PM:
-

Just thought i'd update on this, the {{replace}} function seems to be, 
correctly, replacing {{NaNs. Here's a better example that also demonstrates 
that the problem is limited to columns of type Integer}}:
{code:java}
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
>>> ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|  1.0|0|
|  0.0|3|
|  NaN|0|
+-+-+
>>> df.replace(float('nan'), 2).show()
+-+-+
|index|value|
+-+-+
|  1.0|2|
|  0.0|3|
|  2.0|2|
+-+-+ {code}


was (Author: jayad):
Just thought i'd update on this, the {{replace}} function seems to be, 
correctly, replacing {{NaN}}s. Here's a better example that also demonstrates 
that the problem is limited to columns of type Integer}}:
{code:java}
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], 
>>> ("index", "value"))
>>> df.show()
+-+-+
|index|value|
+-+-+
|  1.0|0|
|  0.0|3|
|  NaN|0|
+-+-+>>> df.replace(float('nan'), 2).show()
+-+-+
|index|value|
+-+-+
|  1.0|2|
|  0.0|3|
|  2.0|2|
+-+-+ {code}

> Zeros are being treated as NaNs
> ---
>
> Key: SPARK-30082
> URL: https://issues.apache.org/jira/browse/SPARK-30082
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: John Ayad
>Priority: Critical
>
> If you attempt to run
> {code:java}
> df = df.replace(float('nan'), somethingToReplaceWith)
> {code}
> It will replace all {{0}} s in columns of type {{Integer}}
> Example code snippet to repro this:
> {code:java}
> from pyspark.sql import SQLContext
> spark = SQLContext(sc).sparkSession
> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> df.show()
> df = df.replace(float('nan'), 5)
> df.show()
> {code}
> Here's the output I get when I run this code:
> {code:java}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.4.4
>   /_/
> Using Python version 3.7.5 (default, Nov  1 2019 02:16:32)
> SparkSession available as 'spark'.
> >>> from pyspark.sql import SQLContext
> >>> spark = SQLContext(sc).sparkSession
> >>> df = spark.createDataFrame([(1, 0), (2, 3), (3, 0)], ("index", "value"))
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|0|
> |2|3|
> |3|0|
> +-+-+
> >>> df = df.replace(float('nan'), 5)
> >>> df.show()
> +-+-+
> |index|value|
> +-+-+
> |1|5|
> |2|3|
> |3|5|
> +-+-+
> >>>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30083) visitArithmeticUnary should wrap PLUS case with UnaryPositive for type checking

2019-11-29 Thread Kent Yao (Jira)
Kent Yao created SPARK-30083:


 Summary: visitArithmeticUnary should wrap PLUS case with 
UnaryPositive for type checking
 Key: SPARK-30083
 URL: https://issues.apache.org/jira/browse/SPARK-30083
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Kent Yao


For PLUS case, visitArithmeticUnary do not wrap the expr with UnaryPositive, so 
it escapes from type checking



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30063) Failure when returning a value from multiple Pandas UDFs

2019-11-29 Thread Ruben Berenguel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985192#comment-16985192
 ] 

Ruben Berenguel commented on SPARK-30063:
-

Hi [~tkellogg] I’d like to have a look, do you have some small or shareable 
reproducible data/code? Otherwise it’s a bit hard to pinpoint in which side 
(Spark, Arrow-Spark, Python) the problem may be (since it may as well be a 
combination of the 3). My hunch is that the schema may be passed incorrectly 
(as in your related bug) or the converse, the schema is being passed correctly 
and the data incorrectly (different order). When that happens the Arrow reader 
at the JVM won’t make sense of the message received, and the error would look 
like that one.

> Failure when returning a value from multiple Pandas UDFs
> 
>
> Key: SPARK-30063
> URL: https://issues.apache.org/jira/browse/SPARK-30063
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.3, 2.4.4
> Environment: Happens on Mac & Ubuntu (Docker). Seems to happen on 
> both 2.4.3 and 2.4.4
>Reporter: Tim Kellogg
>Priority: Major
> Attachments: spark-debug.txt
>
>
> I have 20 Pandas UDFs that I'm trying to evaluate all at the same time.
>  * PandasUDFType.GROUPED_AGG
>  * 3 columns in the input data frame being serialized over Arrow to Python 
> worker. See below for clarification.
>  * All functions take 2 parameters, some combination of the 3 received as 
> Arrow input.
>  * Varying return types, see details below.
> _*I get an IllegalArgumentException on the Scala side of the worker when 
> deserializing from Python.*_
> h2. Exception & Stack Trace
> {code:java}
> 19/11/27 11:38:36 ERROR Executor: Exception in task 0.0 in stage 5.0 (TID 5)
> java.lang.IllegalArgumentException
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543)
>   at 
> org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58)
>   at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowStreamReader.java:132)
>   at 
> org.apache.arrow.vector.ipc.ArrowReader.initialize(ArrowReader.java:181)
>   at 
> org.apache.arrow.vector.ipc.ArrowReader.ensureInitialized(ArrowReader.java:172)
>   at 
> org.apache.arrow.vector.ipc.ArrowReader.getVectorSchemaRoot(ArrowReader.java:65)
>   at 
> org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:162)
>   at 
> org.apache.spark.sql.execution.python.ArrowPythonRunner$$anon$1.read(ArrowPythonRunner.scala:122)
>   at 
> org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
>   at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
>   at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:836)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
>   at org.apache.spark.scheduler.Task.run(Task.scala:123)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
>   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> 19/11/27 11:38:36 WARN TaskSetManager: Lost task 0.0 in stage 5.0 (TID 5, 
> localhost, executor driver): java.lang.IllegalArgumentException
>   at java.nio.ByteBuffer.allocate(ByteBuffer.java:334)
>   at 
> org.apache.arrow.vector.ipc.message.MessageSerializer.readMessage(MessageSerializer.java:543)
>   at 
> org.apache.arrow.vector.ipc.message.MessageChannelReader.readNext(MessageChannelReader.java:58)
>   at 
> org.apache.arrow.vector.ipc.ArrowStreamReader.readSchema(ArrowSt

[jira] [Created] (SPARK-30084) Add docs showing how to automatically rebuild Python API docs

2019-11-29 Thread Nicholas Chammas (Jira)
Nicholas Chammas created SPARK-30084:


 Summary: Add docs showing how to automatically rebuild Python API 
docs
 Key: SPARK-30084
 URL: https://issues.apache.org/jira/browse/SPARK-30084
 Project: Spark
  Issue Type: Improvement
  Components: Build, Documentation
Affects Versions: 3.0.0
Reporter: Nicholas Chammas


`jekyll serve --watch` doesn't watch the API docs. That means you have to kill 
and restart jekyll every time you update your API docs, just to see the effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI

2019-11-29 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985242#comment-16985242
 ] 

Gengliang Wang commented on SPARK-29724:


This issue is resolved in https://github.com/apache/spark/pull/26378

> Support JDBC/ODBC tab for HistoryServer WebUI
> -
>
> Key: SPARK-29724
> URL: https://issues.apache.org/jira/browse/SPARK-29724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Major
>
> Support JDBC/ODBC tab for HistoryServerWebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener

2019-11-29 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-29726.

  Assignee: shahid
Resolution: Fixed

> Support KV store for listener HiveThriftServer2Listener
> ---
>
> Key: SPARK-29726
> URL: https://issues.apache.org/jira/browse/SPARK-29726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Minor
>
> Support KVstore for HiveThriftServer2Listener



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29726) Support KV store for listener HiveThriftServer2Listener

2019-11-29 Thread Gengliang Wang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985243#comment-16985243
 ] 

Gengliang Wang commented on SPARK-29726:


This issue is resolved in https://github.com/apache/spark/pull/26378

> Support KV store for listener HiveThriftServer2Listener
> ---
>
> Key: SPARK-29726
> URL: https://issues.apache.org/jira/browse/SPARK-29726
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Minor
>
> Support KVstore for HiveThriftServer2Listener



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI

2019-11-29 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-29724.

Resolution: Fixed

> Support JDBC/ODBC tab for HistoryServer WebUI
> -
>
> Key: SPARK-29724
> URL: https://issues.apache.org/jira/browse/SPARK-29724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Priority: Major
>
> Support JDBC/ODBC tab for HistoryServerWebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29724) Support JDBC/ODBC tab for HistoryServer WebUI

2019-11-29 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-29724:
--

Assignee: shahid

> Support JDBC/ODBC tab for HistoryServer WebUI
> -
>
> Key: SPARK-29724
> URL: https://issues.apache.org/jira/browse/SPARK-29724
> Project: Spark
>  Issue Type: Sub-task
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: shahid
>Assignee: shahid
>Priority: Major
>
> Support JDBC/ODBC tab for HistoryServerWebUI



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-29991) Support `test-hive1.2` in PR Builder

2019-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-29991.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 26710
[https://github.com/apache/spark/pull/26710]

> Support `test-hive1.2` in PR Builder
> 
>
> Key: SPARK-29991
> URL: https://issues.apache.org/jira/browse/SPARK-29991
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27719) Set maxDisplayLogSize for spark history server

2019-11-29 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-27719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16985270#comment-16985270
 ] 

Jungtaek Lim commented on SPARK-27719:
--

If your application is streaming, I think we are going on the right approach to 
deal with it - see https://issues.apache.org/jira/browse/SPARK-28594.

The point is, normally we don't want to stop reading event log until some 
offset/size, as what we may really have interest is the "latest" status. And 
there're some events which we shouldn't ignore or clean up - app start, app 
end, environment update, etc.

> Set maxDisplayLogSize for spark history server
> --
>
> Key: SPARK-27719
> URL: https://issues.apache.org/jira/browse/SPARK-27719
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.0.0
>Reporter: hao.li
>Priority: Minor
>
> Sometimes a very large eventllog may be useless, and parses it may waste many 
> resources.
> It may be useful to  avoid parse large enventlogs by setting a configuration 
> spark.history.fs.maxDisplayLogSize.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-30085) standardize partition spec in sql reference

2019-11-29 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-30085:
--

 Summary: standardize partition spec in sql reference
 Key: SPARK-30085
 URL: https://issues.apache.org/jira/browse/SPARK-30085
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, SQL
Affects Versions: 3.0.0
Reporter: Huaxin Gao


Use the same partition spec for all the sql reference docs. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29579) Guarantee compatibility of snapshot (live entities, KVstore entities)

2019-11-29 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-29579:
-
Parent: (was: SPARK-28594)
Issue Type: Task  (was: Sub-task)

> Guarantee compatibility of snapshot (live entities, KVstore entities)
> -
>
> Key: SPARK-29579
> URL: https://issues.apache.org/jira/browse/SPARK-29579
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue is a follow-up issue after SPARK-29111 and SPARK-29261, which both 
> issues WILL NOT guarantee compatibility.
> To safely clean up old event log files after snapshot has been written for 
> these files, we have to ensure the snapshot file can restore the state as 
> same as we replay from these event log files. The issue is on migrating to 
> the newer Spark version - if snapshot is not readable due to incompatibility, 
> the app cannot be read entirely as we've already removed old event log files. 
> If we can guarantee compatibility we can move on to the next item, cleaning 
> up old event log files to save space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29579) Guarantee compatibility of snapshot (live entities, KVstore entities)

2019-11-29 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-29579:
-
Parent: SPARK-28870
Issue Type: Sub-task  (was: Task)

> Guarantee compatibility of snapshot (live entities, KVstore entities)
> -
>
> Key: SPARK-29579
> URL: https://issues.apache.org/jira/browse/SPARK-29579
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Jungtaek Lim
>Priority: Major
>
> This issue is a follow-up issue after SPARK-29111 and SPARK-29261, which both 
> issues WILL NOT guarantee compatibility.
> To safely clean up old event log files after snapshot has been written for 
> these files, we have to ensure the snapshot file can restore the state as 
> same as we replay from these event log files. The issue is on migrating to 
> the newer Spark version - if snapshot is not readable due to incompatibility, 
> the app cannot be read entirely as we've already removed old event log files. 
> If we can guarantee compatibility we can move on to the next item, cleaning 
> up old event log files to save space.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29991) Support `test-hive1.2` in PR Builder

2019-11-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-29991:


Assignee: Hyukjin Kwon  (was: Dongjoon Hyun)

> Support `test-hive1.2` in PR Builder
> 
>
> Key: SPARK-29991
> URL: https://issues.apache.org/jira/browse/SPARK-29991
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org