[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28200: Assignee: Apache Spark > Decimal overflow handling in ExpressionEncoder > -- > > Key: SPARK-28200 > URL: https://issues.apache.org/jira/browse/SPARK-28200 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Assignee: Apache Spark >Priority: Major > > As pointed out in https://github.com/apache/spark/pull/20350, we are > currently not checking the overflow when serializing a java/scala > `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`. > We should add this check there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28200) Decimal overflow handling in ExpressionEncoder
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28200: Assignee: (was: Apache Spark) > Decimal overflow handling in ExpressionEncoder > -- > > Key: SPARK-28200 > URL: https://issues.apache.org/jira/browse/SPARK-28200 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Major > > As pointed out in https://github.com/apache/spark/pull/20350, we are > currently not checking the overflow when serializing a java/scala > `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`. > We should add this check there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-25692) Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875691#comment-16875691 ] Dongjoon Hyun commented on SPARK-25692: --- I update the JIRA title according to the last patch which is the real fix of the underlying issue. > Remove static initialization of worker eventLoop handling chunk fetch > requests within TransportContext > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Sanket Reddy >Priority: Blocker > Fix For: 3.0.0 > > Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot > 2018-11-01 at 10.17.16 AM.png > > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25692) Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext
[ https://issues.apache.org/jira/browse/SPARK-25692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-25692: -- Summary: Remove static initialization of worker eventLoop handling chunk fetch requests within TransportContext (was: Flaky test: ChunkFetchIntegrationSuite.fetchBothChunks) > Remove static initialization of worker eventLoop handling chunk fetch > requests within TransportContext > -- > > Key: SPARK-25692 > URL: https://issues.apache.org/jira/browse/SPARK-25692 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Shixiong Zhu >Assignee: Sanket Reddy >Priority: Blocker > Fix For: 3.0.0 > > Attachments: Screen Shot 2018-10-22 at 4.12.41 PM.png, Screen Shot > 2018-11-01 at 10.17.16 AM.png > > > Looks like the whole test suite is pretty flaky. See: > https://amplab.cs.berkeley.edu/jenkins/job/spark-master-test-maven-hadoop-2.6/5490/testReport/junit/org.apache.spark.network/ChunkFetchIntegrationSuite/history/ > This may be a regression in 3.0 as this didn't happen in 2.4 branch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28196) Add a new `listTables` and `listLocalTempViews` APIs for SessionCatalog
[ https://issues.apache.org/jira/browse/SPARK-28196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28196. - Resolution: Fixed Assignee: Yuming Wang Fix Version/s: 3.0.0 > Add a new `listTables` and `listLocalTempViews` APIs for SessionCatalog > --- > > Key: SPARK-28196 > URL: https://issues.apache.org/jira/browse/SPARK-28196 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > > > {code:scala} > def listTables(db: String, pattern: String, includeLocalTempViews: Boolean): > Seq[TableIdentifier] > def listLocalTempViews(pattern: String): Seq[TableIdentifier] > {code} > Because in some cases {{listTables}} does not need local temporary view and > sometimes only need list local temporary view. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-28184) Avoid creating new sessions in SparkMetadataOperationSuite
[ https://issues.apache.org/jira/browse/SPARK-28184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-28184. - Resolution: Fixed Fix Version/s: 3.0.0 > Avoid creating new sessions in SparkMetadataOperationSuite > -- > > Key: SPARK-28184 > URL: https://issues.apache.org/jira/browse/SPARK-28184 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28184) Avoid creating new sessions in SparkMetadataOperationSuite
[ https://issues.apache.org/jira/browse/SPARK-28184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-28184: --- Assignee: Yuming Wang > Avoid creating new sessions in SparkMetadataOperationSuite > -- > > Key: SPARK-28184 > URL: https://issues.apache.org/jira/browse/SPARK-28184 > Project: Spark > Issue Type: Improvement > Components: Tests >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases
[ https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-28170: -- Priority: Trivial (was: Minor) > DenseVector .toArray() and .values documentation do not specify they are > aliases > > > Key: SPARK-28170 > URL: https://issues.apache.org/jira/browse/SPARK-28170 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 2.4.3 >Reporter: Sivam Pasupathipillai >Priority: Trivial > > The documentation of the *toArray()* method and the *values* property in > pyspark.ml.linalg.DenseVector is confusing. > *toArray():* Returns an numpy.ndarray > *values**:* Returns a list of values > However, they are actually aliases and they both return a numpy.ndarray. > FIX: either change the documentation or change the *values* property to > return a Python list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-11412) Support merge schema for ORC
[ https://issues.apache.org/jira/browse/SPARK-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-11412. - Resolution: Fixed Fix Version/s: 3.0.0 > Support merge schema for ORC > > > Key: SPARK-11412 > URL: https://issues.apache.org/jira/browse/SPARK-11412 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 1.6.3, 2.0.0, 2.1.1, 2.2.0 >Reporter: Dave >Priority: Major > Fix For: 3.0.0 > > > when I tried to load partitioned orc files with a slight difference in a > nested column. say > column > -- request: struct (nullable = true) > ||-- datetime: string (nullable = true) > ||-- host: string (nullable = true) > ||-- ip: string (nullable = true) > ||-- referer: string (nullable = true) > ||-- request_uri: string (nullable = true) > ||-- uri: string (nullable = true) > ||-- useragent: string (nullable = true) > And then there's a page_url_lists attributes in the later partitions. > I tried to use > val s = sqlContext.read.format("orc").option("mergeSchema", > "true").load("/data/warehouse/") to load the data. > But the schema doesn't show request.page_url_lists. > I am wondering if schema merge doesn't work for orc? -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28217) Allow a pluggable statistics plan visitor for a logical plan.
[ https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Terry Kim updated SPARK-28217: -- Summary: Allow a pluggable statistics plan visitor for a logical plan. (was: Allow a custom statistics logical plan visitor to be plugged in.) > Allow a pluggable statistics plan visitor for a logical plan. > - > > Key: SPARK-28217 > URL: https://issues.apache.org/jira/browse/SPARK-28217 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Major > Original Estimate: 120h > Remaining Estimate: 120h > > Spark currently has two built-in statistics plan visitor: > SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a > bit limited since there is no way to plug in a custom plan visitor - from > which a custom query optimizer can benefit from. > We can provide a Spark conf that the user can specify to override the > built-in plan visitor: > {code:scala} > // First create your custom stat plan visitor. > class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] { > // Implement LogicalPlanVisitor[Statistics] trait > } > // Set the visitor via Spark conf. > spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", > "MyStatsPlanVisitor") > // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat > plan visitor. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.
[ https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28217: Assignee: (was: Apache Spark) > Allow a custom statistics logical plan visitor to be plugged in. > > > Key: SPARK-28217 > URL: https://issues.apache.org/jira/browse/SPARK-28217 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Priority: Major > Original Estimate: 120h > Remaining Estimate: 120h > > Spark currently has two built-in statistics plan visitor: > SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a > bit limited since there is no way to plug in a custom plan visitor - from > which a custom query optimizer can benefit from. > We can provide a Spark conf that the user can specify to override the > built-in plan visitor: > {code:scala} > // First create your custom stat plan visitor. > class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] { > // Implement LogicalPlanVisitor[Statistics] trait > } > // Set the visitor via Spark conf. > spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", > "MyStatsPlanVisitor") > // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat > plan visitor. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.
[ https://issues.apache.org/jira/browse/SPARK-28217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28217: Assignee: Apache Spark > Allow a custom statistics logical plan visitor to be plugged in. > > > Key: SPARK-28217 > URL: https://issues.apache.org/jira/browse/SPARK-28217 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Terry Kim >Assignee: Apache Spark >Priority: Major > Original Estimate: 120h > Remaining Estimate: 120h > > Spark currently has two built-in statistics plan visitor: > SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a > bit limited since there is no way to plug in a custom plan visitor - from > which a custom query optimizer can benefit from. > We can provide a Spark conf that the user can specify to override the > built-in plan visitor: > {code:scala} > // First create your custom stat plan visitor. > class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] { > // Implement LogicalPlanVisitor[Statistics] trait > } > // Set the visitor via Spark conf. > spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", > "MyStatsPlanVisitor") > // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat > plan visitor. > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28217) Allow a custom statistics logical plan visitor to be plugged in.
Terry Kim created SPARK-28217: - Summary: Allow a custom statistics logical plan visitor to be plugged in. Key: SPARK-28217 URL: https://issues.apache.org/jira/browse/SPARK-28217 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Terry Kim Spark currently has two built-in statistics plan visitor: SizeInBytesOnlyStatsPlanVisitor and BasicStatsPlanVisitor. However, this is a bit limited since there is no way to plug in a custom plan visitor - from which a custom query optimizer can benefit from. We can provide a Spark conf that the user can specify to override the built-in plan visitor: {code:scala} // First create your custom stat plan visitor. class MyStatsPlanVisitor extends LogicalPlanVisitor[Statistics] { // Implement LogicalPlanVisitor[Statistics] trait } // Set the visitor via Spark conf. spark.conf.set("spark.sql.catalyst.statsPlanVisitorClass", "MyStatsPlanVisitor") // Now, stat() on a LogicalPlan object will use MyStatsPlanVisitor as a stat plan visitor. {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28200) Decimal overflow handling in ExpressionEncoder
[ https://issues.apache.org/jira/browse/SPARK-28200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875568#comment-16875568 ] Josh Rosen commented on SPARK-28200: MickJermsurawong and I have a patch for this, including tests covering both ExpressionEncoder and RowEncoder; we'll submit a PR early next week. > Decimal overflow handling in ExpressionEncoder > -- > > Key: SPARK-28200 > URL: https://issues.apache.org/jira/browse/SPARK-28200 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Major > > As pointed out in https://github.com/apache/spark/pull/20350, we are > currently not checking the overflow when serializing a java/scala > `BigDecimal` in `ExpressionEncoder` / `ScalaReflection`. > We should add this check there too. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27027) from_avro function does not deserialize the Avro record of a struct column type correctly
[ https://issues.apache.org/jira/browse/SPARK-27027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875549#comment-16875549 ] Hien Luu commented on SPARK-27027: -- FYI - this issue is still reproducible in Spark 2.4.3 version (from the console using ./bin/spark-shell --packages org.apache.spark:spark-avro_2.11:2.4.3 command) > from_avro function does not deserialize the Avro record of a struct column > type correctly > - > > Key: SPARK-27027 > URL: https://issues.apache.org/jira/browse/SPARK-27027 > Project: Spark > Issue Type: Bug > Components: Spark Shell, SQL >Affects Versions: 2.4.0, 3.0.0 >Reporter: Hien Luu >Priority: Minor > > {{from_avro}} function produces wrong output of a struct field. See the > output at the bottom of the description > {code} > import org.apache.spark.sql.types._ > import org.apache.spark.sql.avro._ > import org.apache.spark.sql.functions._ > spark.version > val df = Seq((1, "John Doe", 30), (2, "Mary Jane", 25), (3, "Josh Duke", > 50)).toDF("id", "name", "age") > val dfStruct = df.withColumn("value", struct("name","age")) > dfStruct.show > dfStruct.printSchema > val dfKV = dfStruct.select(to_avro('id).as("key"), > to_avro('value).as("value")) > val expectedSchema = StructType(Seq(StructField("name", StringType, > true),StructField("age", IntegerType, false))) > val avroTypeStruct = SchemaConverters.toAvroType(expectedSchema).toString > val avroTypeStr = s""" > |{ > | "type": "int", > | "name": "key" > |} > """.stripMargin > dfKV.select(from_avro('key, avroTypeStr)).show > dfKV.select(from_avro('value, avroTypeStruct)).show > // output for the last statement and that is not correct > +-+ > |from_avro(value, struct)| > +-+ > | [Josh Duke, 50]| > | [Josh Duke, 50]| > | [Josh Duke, 50]| > +-+ > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-28216) Add calculate local directory size to SQLTestUtils
[ https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-28216: Description: We can move {{getDataSize}} from {{StatisticsCollectionTestBase}} to {{SQLTestUtils}} and makes it more common. We can avoid these changes after move it: [!https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png!|https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png] > Add calculate local directory size to SQLTestUtils > -- > > Key: SPARK-28216 > URL: https://issues.apache.org/jira/browse/SPARK-28216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > > We can move {{getDataSize}} from {{StatisticsCollectionTestBase}} to > {{SQLTestUtils}} and makes it more common. > We can avoid these changes after move it: > [!https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png!|https://user-images.githubusercontent.com/5399861/60386910-66ca8680-9ace-11e9-8d52-e1eea38e324a.png] -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28216) Add calculate local directory size to SQLTestUtils
[ https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28216: Assignee: (was: Apache Spark) > Add calculate local directory size to SQLTestUtils > -- > > Key: SPARK-28216 > URL: https://issues.apache.org/jira/browse/SPARK-28216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28216) Add calculate local directory size to SQLTestUtils
[ https://issues.apache.org/jira/browse/SPARK-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28216: Assignee: Apache Spark > Add calculate local directory size to SQLTestUtils > -- > > Key: SPARK-28216 > URL: https://issues.apache.org/jira/browse/SPARK-28216 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28216) Add calculate local directory size to SQLTestUtils
Yuming Wang created SPARK-28216: --- Summary: Add calculate local directory size to SQLTestUtils Key: SPARK-28216 URL: https://issues.apache.org/jira/browse/SPARK-28216 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.0.0 Reporter: Yuming Wang -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28215) as_tibble was removed from Arrow R API
[ https://issues.apache.org/jira/browse/SPARK-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28215: Assignee: Apache Spark > as_tibble was removed from Arrow R API > -- > > Key: SPARK-28215 > URL: https://issues.apache.org/jira/browse/SPARK-28215 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Assignee: Apache Spark >Priority: Major > > New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R > doesn't work now due to the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28215) as_tibble was removed from Arrow R API
[ https://issues.apache.org/jira/browse/SPARK-28215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28215: Assignee: (was: Apache Spark) > as_tibble was removed from Arrow R API > -- > > Key: SPARK-28215 > URL: https://issues.apache.org/jira/browse/SPARK-28215 > Project: Spark > Issue Type: Bug > Components: R >Affects Versions: 3.0.0 >Reporter: Liang-Chi Hsieh >Priority: Major > > New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R > doesn't work now due to the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-28215) as_tibble was removed from Arrow R API
Liang-Chi Hsieh created SPARK-28215: --- Summary: as_tibble was removed from Arrow R API Key: SPARK-28215 URL: https://issues.apache.org/jira/browse/SPARK-28215 Project: Spark Issue Type: Bug Components: R Affects Versions: 3.0.0 Reporter: Liang-Chi Hsieh New R api of Arrow has removed `as_tibble`. Arrow optimized collect in R doesn't work now due to the change. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases
[ https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28170: Assignee: Apache Spark > DenseVector .toArray() and .values documentation do not specify they are > aliases > > > Key: SPARK-28170 > URL: https://issues.apache.org/jira/browse/SPARK-28170 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 2.4.3 >Reporter: Sivam Pasupathipillai >Assignee: Apache Spark >Priority: Minor > > The documentation of the *toArray()* method and the *values* property in > pyspark.ml.linalg.DenseVector is confusing. > *toArray():* Returns an numpy.ndarray > *values**:* Returns a list of values > However, they are actually aliases and they both return a numpy.ndarray. > FIX: either change the documentation or change the *values* property to > return a Python list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28170) DenseVector .toArray() and .values documentation do not specify they are aliases
[ https://issues.apache.org/jira/browse/SPARK-28170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28170: Assignee: (was: Apache Spark) > DenseVector .toArray() and .values documentation do not specify they are > aliases > > > Key: SPARK-28170 > URL: https://issues.apache.org/jira/browse/SPARK-28170 > Project: Spark > Issue Type: Improvement > Components: ML, MLlib, PySpark >Affects Versions: 2.4.3 >Reporter: Sivam Pasupathipillai >Priority: Minor > > The documentation of the *toArray()* method and the *values* property in > pyspark.ml.linalg.DenseVector is confusing. > *toArray():* Returns an numpy.ndarray > *values**:* Returns a list of values > However, they are actually aliases and they both return a numpy.ndarray. > FIX: either change the documentation or change the *values* property to > return a Python list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-28186) array_contains returns null instead of false when one of the items in the array is null
[ https://issues.apache.org/jira/browse/SPARK-28186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875447#comment-16875447 ] Marco Gaido commented on SPARK-28186: - This is the right behavior AFAIK. Why are you saying it is wrong? > array_contains returns null instead of false when one of the items in the > array is null > --- > > Key: SPARK-28186 > URL: https://issues.apache.org/jira/browse/SPARK-28186 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.0 >Reporter: Alex Kushnir >Priority: Major > > If array of items contains a null item then array_contains returns true if > item is found but if item is not found it returns null instead of false > Seq( > (1, Seq("a", "b", "c")), > (2, Seq("a", "b", null, "c")) > ).toDF("id", "vals").createOrReplaceTempView("tbl") > spark.sql("select id, vals, array_contains(vals, 'a') as has_a, > array_contains(vals, 'd') as has_d from tbl").show > ++-++--+ > |id|vals|has_a|has_d| > ++-++--+ > |1|[a, b, c]|true|false| > |2|[a, b,, c]|true|null| > ++-++--+ -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28201) Revisit MakeDecimal behavior on overflow
[ https://issues.apache.org/jira/browse/SPARK-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28201: Assignee: Apache Spark > Revisit MakeDecimal behavior on overflow > > > Key: SPARK-28201 > URL: https://issues.apache.org/jira/browse/SPARK-28201 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Assignee: Apache Spark >Priority: Major > > As pointed out in > https://github.com/apache/spark/pull/20350#issuecomment-505997469, in special > cases of decimal aggregation we are using the `MakeDecimal` operator. > This operator has a not well defined behavior in case of overflow, namely > what it does currently is: > - if codegen is enabled it returns null; > - in interpreted mode it throws an `IllegalArgumentException`. > So we should make his behavior uniform with other similar cases and in > particular we should honor the value of the conf introduced in SPARK-23179 > and behave accordingly, ie.: > - returning null if the flag is true; > - throw an `ArithmeticException` if the flag is false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-28201) Revisit MakeDecimal behavior on overflow
[ https://issues.apache.org/jira/browse/SPARK-28201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-28201: Assignee: (was: Apache Spark) > Revisit MakeDecimal behavior on overflow > > > Key: SPARK-28201 > URL: https://issues.apache.org/jira/browse/SPARK-28201 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Marco Gaido >Priority: Major > > As pointed out in > https://github.com/apache/spark/pull/20350#issuecomment-505997469, in special > cases of decimal aggregation we are using the `MakeDecimal` operator. > This operator has a not well defined behavior in case of overflow, namely > what it does currently is: > - if codegen is enabled it returns null; > - in interpreted mode it throws an `IllegalArgumentException`. > So we should make his behavior uniform with other similar cases and in > particular we should honor the value of the conf introduced in SPARK-23179 > and behave accordingly, ie.: > - returning null if the flag is true; > - throw an `ArithmeticException` if the flag is false. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org