[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28200:


Assignee: Apache Spark

> Decimal overflow handling in ExpressionEncoder
> --
>
> Key: SPARK-28200
> URL: https://issues.apache.org/jira/browse/SPARK-28200
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Assignee: Apache Spark
>Priority: Major
>
> As pointed out in https://github.com/apache/spark/pull/20350, we are 
> currently not checking the overflow when serializing a java/scala 
> `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`.
> We should add this check there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28200:


Assignee: (was: Apache Spark)

> Decimal overflow handling in ExpressionEncoder
> --
>
> Key: SPARK-28200
> URL: https://issues.apache.org/jira/browse/SPARK-28200
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Priority: Major
>
> As pointed out in https://github.com/apache/spark/pull/20350, we are 
> currently not checking the overflow when serializing a java/scala 
> `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`.
> We should add this check there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25692) Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext

2019-06-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875691#comment-16875691
 ] 

Dongjoon Hyun commented on SPARK-25692:
---

I update the JIRA title according to the last patch which is the real fix of 
the underlying issue.

> Remove static initialization of worker eventLoop handling chunk fetch 
> requests within TransportContext
> --
>
> Key: SPARK-25692
> URL: https://issues.apache.org/jira/browse/SPARK-25692
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Sanket Reddy
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot 
> 2018-11-01 at 10.17.16 AM.png
>
>
> Looks like the whole test suite is pretty flaky. See: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/
> This may be a regression in 3.0 as this didn't happen in 2.4 branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25692) Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext

2019-06-29 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-25692:
--
Summary: Remove static initialization of worker eventLoop handling chunk 
fetch requests within TransportContext  (was: Flaky test: 
ChunkFetchIntegrationSuite.fetchBothChunks)

> Remove static initialization of worker eventLoop handling chunk fetch 
> requests within TransportContext
> --
>
> Key: SPARK-25692
> URL: https://issues.apache.org/jira/browse/SPARK-25692
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Shixiong Zhu
>Assignee: Sanket Reddy
>Priority: Blocker
> Fix For: 3.0.0
>
> Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot 
> 2018-11-01 at 10.17.16 AM.png
>
>
> Looks like the whole test suite is pretty flaky. See: 
> https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/
> This may be a regression in 3.0 as this didn't happen in 2.4 branch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28196) Add a new `listTables` and `listLocalTempViews` APIs for SessionCatalog

2019-06-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28196.
-
   Resolution: Fixed
 Assignee: Yuming Wang
Fix Version/s: 3.0.0

> Add a new `listTables` and `listLocalTempViews` APIs for SessionCatalog
> ---
>
> Key: SPARK-28196
> URL: https://issues.apache.org/jira/browse/SPARK-28196
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>
>  
> {code:scala}
> def listTables(db: String, pattern: String, includeLocalTempViews: Boolean): 
> Seq[TableIdentifier]
> def listLocalTempViews(pattern: String): Seq[TableIdentifier]
> {code}
> Because in some cases {{listTables}} does not need local temporary view and 
> sometimes only need list local temporary view.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28184) Avoid creating new sessions in SparkMetadataOperationSuite

2019-06-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-28184.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

> Avoid creating new sessions in SparkMetadataOperationSuite
> --
>
> Key: SPARK-28184
> URL: https://issues.apache.org/jira/browse/SPARK-28184
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28184) Avoid creating new sessions in SparkMetadataOperationSuite

2019-06-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-28184:
---

Assignee: Yuming Wang

> Avoid creating new sessions in SparkMetadataOperationSuite
> --
>
> Key: SPARK-28184
> URL: https://issues.apache.org/jira/browse/SPARK-28184
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases

2019-06-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-28170:
--
Priority: Trivial  (was: Minor)

> DenseVector .toArray() and .values documentation do not specify they are 
> aliases
> 
>
> Key: SPARK-28170
> URL: https://issues.apache.org/jira/browse/SPARK-28170
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 2.4.3
>Reporter: Sivam Pasupathipillai
>Priority: Trivial
>
> The documentation of the *toArray()* method and the *values* property in 
> pyspark.ml.linalg.DenseVector is confusing.
> *toArray():* Returns an numpy.ndarray
> *values**:* Returns a list of values
> However, they are actually aliases and they both return a numpy.ndarray.
> FIX: either change the documentation or change  the *values* property to 
> return a Python list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-11412) Support merge schema for ORC

2019-06-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-11412.
-
   Resolution: Fixed
Fix Version/s: 3.0.0

> Support merge schema for ORC
> 
>
> Key: SPARK-11412
> URL: https://issues.apache.org/jira/browse/SPARK-11412
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 1.6.3, 2.0.0, 2.1.1, 2.2.0
>Reporter: Dave
>Priority: Major
> Fix For: 3.0.0
>
>
> when I tried to load partitioned orc files with a slight difference in a 
> nested column. say 
> column 
> -- request: struct (nullable = true)
>  ||-- datetime: string (nullable = true)
>  ||-- host: string (nullable = true)
>  ||-- ip: string (nullable = true)
>  ||-- referer: string (nullable = true)
>  ||-- request_uri: string (nullable = true)
>  ||-- uri: string (nullable = true)
>  ||-- useragent: string (nullable = true)
> And then there's a page_url_lists attributes in the later partitions.
> I tried to use
> val s = sqlContext.read.format("orc").option("mergeSchema", 
> "true").load("/data/warehouse/") to load the data.
> But the schema doesn't show request.page_url_lists.
> I am wondering if schema merge doesn't work for orc?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28217) Allow a pluggable statistics plan visitor for a logical plan.

2019-06-29 Thread Terry Kim (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Terry Kim updated SPARK-28217:
--
Summary: Allow a pluggable statistics plan visitor for a logical plan.  
(was: Allow a custom statistics logical plan visitor to be plugged in.)

> Allow a pluggable statistics plan visitor for a logical plan.
> -
>
> Key: SPARK-28217
> URL: https://issues.apache.org/jira/browse/SPARK-28217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Spark currently has two built-in statistics plan visitor: 
> SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a 
> bit limited since there is no way to plug in a custom plan visitor - from 
> which a custom query optimizer can benefit from.
> We can provide a Spark conf that the user can specify to override the 
> built-in plan visitor:
> {code:scala}
> // First create your custom stat plan visitor.
> class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] {
>   // Implement LogicalPlanVisitor[Statistics] trait
> }
> // Set the visitor via Spark conf.
> spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", 
> "MyStatsPlanVisitor")
> // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat 
> plan visitor.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28217:


Assignee: (was: Apache Spark)

> Allow a custom statistics logical plan visitor to be plugged in.
> 
>
> Key: SPARK-28217
> URL: https://issues.apache.org/jira/browse/SPARK-28217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Priority: Major
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Spark currently has two built-in statistics plan visitor: 
> SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a 
> bit limited since there is no way to plug in a custom plan visitor - from 
> which a custom query optimizer can benefit from.
> We can provide a Spark conf that the user can specify to override the 
> built-in plan visitor:
> {code:scala}
> // First create your custom stat plan visitor.
> class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] {
>   // Implement LogicalPlanVisitor[Statistics] trait
> }
> // Set the visitor via Spark conf.
> spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", 
> "MyStatsPlanVisitor")
> // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat 
> plan visitor.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28217:


Assignee: Apache Spark

> Allow a custom statistics logical plan visitor to be plugged in.
> 
>
> Key: SPARK-28217
> URL: https://issues.apache.org/jira/browse/SPARK-28217
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Major
>   Original Estimate: 120h
>  Remaining Estimate: 120h
>
> Spark currently has two built-in statistics plan visitor: 
> SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a 
> bit limited since there is no way to plug in a custom plan visitor - from 
> which a custom query optimizer can benefit from.
> We can provide a Spark conf that the user can specify to override the 
> built-in plan visitor:
> {code:scala}
> // First create your custom stat plan visitor.
> class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] {
>   // Implement LogicalPlanVisitor[Statistics] trait
> }
> // Set the visitor via Spark conf.
> spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", 
> "MyStatsPlanVisitor")
> // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat 
> plan visitor.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.

2019-06-29 Thread Terry Kim (JIRA)
Terry Kim created SPARK-28217:
-

 Summary: Allow a custom statistics logical plan visitor to be 
plugged in.
 Key: SPARK-28217
 URL: https://issues.apache.org/jira/browse/SPARK-28217
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Terry Kim


Spark currently has two built-in statistics plan visitor: 
SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a 
bit limited since there is no way to plug in a custom plan visitor - from which 
a custom query optimizer can benefit from.

We can provide a Spark conf that the user can specify to override the built-in 
plan visitor:

{code:scala}
// First create your custom stat plan visitor.
class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] {
  // Implement LogicalPlanVisitor[Statistics] trait
}

// Set the visitor via Spark conf.
spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", "MyStatsPlanVisitor")

// Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat 
plan visitor.
{code}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28200) Decimal overflow handling in ExpressionEncoder

2019-06-29 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875568#comment-16875568
 ] 

Josh Rosen commented on SPARK-28200:


MickJermsurawong and I have a patch for this, including tests covering both 
ExpressionEncoder and RowEncoder; we'll submit a PR early next week.

> Decimal overflow handling in ExpressionEncoder
> --
>
> Key: SPARK-28200
> URL: https://issues.apache.org/jira/browse/SPARK-28200
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Priority: Major
>
> As pointed out in https://github.com/apache/spark/pull/20350, we are 
> currently not checking the overflow when serializing a java/scala 
> `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`.
> We should add this check there too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27027) from_avro function does not deserialize the Avro record of a struct column type correctly

2019-06-29 Thread Hien Luu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875549#comment-16875549
 ] 

Hien Luu commented on SPARK-27027:
--

FYI - this issue is still reproducible in Spark 2.4.3 version (from the console 
using 

./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.3 command)

> from_avro function does not deserialize the Avro record of a struct column 
> type correctly
> -
>
> Key: SPARK-27027
> URL: https://issues.apache.org/jira/browse/SPARK-27027
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Hien Luu
>Priority: Minor
>
> {{from_avro}} function produces wrong output of a struct field.  See the 
> output at the bottom of the description
> {code}
> import org.apache.spark.sql.types._
> import org.apache.spark.sql.avro._
> import org.apache.spark.sql.functions._
> spark.version
> val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25), (3, "Josh Duke", 
> 50)).toDF("id", "name", "age")
> val dfStruct = df.withColumn("value", struct("name","age"))
> dfStruct.show
> dfStruct.printSchema
> val dfKV = dfStruct.select(to_avro('id).as("key"), 
> to_avro('value).as("value"))
> val expectedSchema = StructType(Seq(StructField("name", StringType, 
> true),StructField("age", IntegerType, false)))
> val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString
> val avroTypeStr = s"""
>  |{
>  | "type": "int",
>  | "name": "key"
>  |}
>  """.stripMargin
> dfKV.select(from_avro('key, avroTypeStr)).show
> dfKV.select(from_avro('value, avroTypeStruct)).show
> // output for the last statement and that is not correct
> +-+
> |from_avro(value, struct)|
> +-+
> | [Josh Duke, 50]|
> | [Josh Duke, 50]|
> | [Josh Duke, 50]|
> +-+
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28216) Add calculate local directory size to SQLTestUtils

2019-06-29 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28216:

Description: 
We can move {{getDataSize}} from {{StatisticsCollectionTestBase}} to 
{{SQLTestUtils}} and makes it more common.

We can avoid these changes after move it:
[!https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png!|https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png]

> Add calculate local directory size to SQLTestUtils
> --
>
> Key: SPARK-28216
> URL: https://issues.apache.org/jira/browse/SPARK-28216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We can move {{getDataSize}} from {{StatisticsCollectionTestBase}} to 
> {{SQLTestUtils}} and makes it more common.
> We can avoid these changes after move it:
> [!https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png!|https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28216) Add calculate local directory size to SQLTestUtils

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28216:


Assignee: (was: Apache Spark)

> Add calculate local directory size to SQLTestUtils
> --
>
> Key: SPARK-28216
> URL: https://issues.apache.org/jira/browse/SPARK-28216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28216) Add calculate local directory size to SQLTestUtils

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28216:


Assignee: Apache Spark

> Add calculate local directory size to SQLTestUtils
> --
>
> Key: SPARK-28216
> URL: https://issues.apache.org/jira/browse/SPARK-28216
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28216) Add calculate local directory size to SQLTestUtils

2019-06-29 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28216:
---

 Summary: Add calculate local directory size to SQLTestUtils
 Key: SPARK-28216
 URL: https://issues.apache.org/jira/browse/SPARK-28216
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28215) as_tibble was removed from Arrow R API

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28215:


Assignee: Apache Spark

> as_tibble was removed from Arrow R API
> --
>
> Key: SPARK-28215
> URL: https://issues.apache.org/jira/browse/SPARK-28215
> Project: Spark
>  Issue Type: Bug
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Apache Spark
>Priority: Major
>
> New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R 
> doesn't work now due to the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28215) as_tibble was removed from Arrow R API

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28215:


Assignee: (was: Apache Spark)

> as_tibble was removed from Arrow R API
> --
>
> Key: SPARK-28215
> URL: https://issues.apache.org/jira/browse/SPARK-28215
> Project: Spark
>  Issue Type: Bug
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>
> New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R 
> doesn't work now due to the change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28215) as_tibble was removed from Arrow R API

2019-06-29 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-28215:
---

 Summary: as_tibble was removed from Arrow R API
 Key: SPARK-28215
 URL: https://issues.apache.org/jira/browse/SPARK-28215
 Project: Spark
  Issue Type: Bug
  Components: R
Affects Versions: 3.0.0
Reporter: Liang-Chi Hsieh


New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R 
doesn't work now due to the change.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28170:


Assignee: Apache Spark

> DenseVector .toArray() and .values documentation do not specify they are 
> aliases
> 
>
> Key: SPARK-28170
> URL: https://issues.apache.org/jira/browse/SPARK-28170
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 2.4.3
>Reporter: Sivam Pasupathipillai
>Assignee: Apache Spark
>Priority: Minor
>
> The documentation of the *toArray()* method and the *values* property in 
> pyspark.ml.linalg.DenseVector is confusing.
> *toArray():* Returns an numpy.ndarray
> *values**:* Returns a list of values
> However, they are actually aliases and they both return a numpy.ndarray.
> FIX: either change the documentation or change  the *values* property to 
> return a Python list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28170:


Assignee: (was: Apache Spark)

> DenseVector .toArray() and .values documentation do not specify they are 
> aliases
> 
>
> Key: SPARK-28170
> URL: https://issues.apache.org/jira/browse/SPARK-28170
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib, PySpark
>Affects Versions: 2.4.3
>Reporter: Sivam Pasupathipillai
>Priority: Minor
>
> The documentation of the *toArray()* method and the *values* property in 
> pyspark.ml.linalg.DenseVector is confusing.
> *toArray():* Returns an numpy.ndarray
> *values**:* Returns a list of values
> However, they are actually aliases and they both return a numpy.ndarray.
> FIX: either change the documentation or change  the *values* property to 
> return a Python list.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null

2019-06-29 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875447#comment-16875447
 ] 

Marco Gaido commented on SPARK-28186:
-

This is the right behavior AFAIK. Why are you saying it is wrong?

> array_contains returns null instead of false when one of the items in the 
> array is null
> ---
>
> Key: SPARK-28186
> URL: https://issues.apache.org/jira/browse/SPARK-28186
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.0
>Reporter: Alex Kushnir
>Priority: Major
>
> If array of items contains a null item then array_contains returns true if 
> item is found but if item is not found it returns null instead of false
> Seq(
> (1, Seq("a", "b", "c")),
> (2, Seq("a", "b", null, "c"))
> ).toDF("id", "vals").createOrReplaceTempView("tbl")
> spark.sql("select id, vals, array_contains(vals, 'a') as has_a, 
> array_contains(vals, 'd') as has_d from tbl").show
>  ++-++--+
> |id|vals|has_a|has_d|
> ++-++--+
> |1|[a, b, c]|true|false|
> |2|[a, b,, c]|true|null|
> ++-++--+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28201) Revisit MakeDecimal behavior on overflow

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28201:


Assignee: Apache Spark

> Revisit MakeDecimal behavior on overflow
> 
>
> Key: SPARK-28201
> URL: https://issues.apache.org/jira/browse/SPARK-28201
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Assignee: Apache Spark
>Priority: Major
>
> As pointed out in 
> https://github.com/apache/spark/pull/20350#issuecomment-505997469, in special 
> cases of decimal aggregation we are using the `MakeDecimal` operator.
> This operator has a not well defined behavior in case of overflow, namely 
> what it does currently is:
>  - if codegen is enabled it returns null;
>  -  in interpreted mode it throws an `IllegalArgumentException`.
> So we should make his behavior uniform with other similar cases and in 
> particular we should honor the value of the conf introduced in SPARK-23179 
> and behave accordingly, ie.:
>  - returning null if the flag is true;
>  - throw an `ArithmeticException` if the flag is false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28201) Revisit MakeDecimal behavior on overflow

2019-06-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-28201:


Assignee: (was: Apache Spark)

> Revisit MakeDecimal behavior on overflow
> 
>
> Key: SPARK-28201
> URL: https://issues.apache.org/jira/browse/SPARK-28201
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Marco Gaido
>Priority: Major
>
> As pointed out in 
> https://github.com/apache/spark/pull/20350#issuecomment-505997469, in special 
> cases of decimal aggregation we are using the `MakeDecimal` operator.
> This operator has a not well defined behavior in case of overflow, namely 
> what it does currently is:
>  - if codegen is enabled it returns null;
>  -  in interpreted mode it throws an `IllegalArgumentException`.
> So we should make his behavior uniform with other similar cases and in 
> particular we should honor the value of the conf introduced in SPARK-23179 
> and behave accordingly, ie.:
>  - returning null if the flag is true;
>  - throw an `ArithmeticException` if the flag is false.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org