[jira] [Commented] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106001#comment-17106001
 ] 

Apache Spark commented on SPARK-31696:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28518

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31696:


Assignee: (was: Apache Spark)

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17106000#comment-17106000
 ] 

Apache Spark commented on SPARK-31696:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28518

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31696:


Assignee: Apache Spark

> Support spark.kubernetes.driver.service.annotation
> --
>
> Key: SPARK-31696
> URL: https://issues.apache.org/jira/browse/SPARK-31696
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31696) Support spark.kubernetes.driver.service.annotation

2020-05-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31696:
-

 Summary: Support spark.kubernetes.driver.service.annotation
 Key: SPARK-31696
 URL: https://issues.apache.org/jira/browse/SPARK-31696
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31588) merge small files may need more common setting

2020-05-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31588.
--
Resolution: Won't Fix

> merge small files may need more common setting
> --
>
> Key: SPARK-31588
> URL: https://issues.apache.org/jira/browse/SPARK-31588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: spark:2.4.5
> hdp:2.7
>Reporter: philipse
>Priority: Major
>
> Hi ,
> SparkSql now allow us to use  repartition or coalesce to manually control the 
> small files like the following
> /*+ REPARTITION(1) */
> /*+ COALESCE(1) */
> But it can only be  tuning case by case ,we need to decide whether we need to 
> use COALESCE or REPARTITION,can we try a more common way to reduce the 
> decision by set the target size  as hive did
> *Good points:*
> 1)we will also the new partitions number
> 2)with an ON-OFF parameter  provided , user can close it if needed
> 3)the parmeter can be set at cluster level instand of user side,it will be 
> more easier to controll samll files.
> 4)greatly reduce the pressue of namenode
>  
> *Not good points:*
> 1)It will add a new task to calculate the target numbers by stastics the out 
> files.
>  
> I don't know whether we have planned this in future.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31686) Return of String instead of array in function get_json_object

2020-05-12 Thread daile (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105939#comment-17105939
 ] 

daile commented on SPARK-31686:
---

[~bruneltouopi] looks like it was specifically removed
{code:java}
val buf = buffer.getBuffer
if (dirty > 1) {
  g.writeRawValue(buf.toString)
} else if (dirty == 1) {
  // remove outer array tokens
  g.writeRawValue(buf.substring(1, buf.length()-1))
} // else do not write anything
{code}

> Return of String instead of array in function get_json_object
> -
>
> Key: SPARK-31686
> URL: https://issues.apache.org/jira/browse/SPARK-31686
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: {code:json}
> // code placeholder
> {
> customer:{ 
>  addesses:[ { {code}
>                   location :  arizona
>                   }
>                ]
> }
> }
>  get_json_object(string(customer),'$addresses[*].location')
> return "arizona"
> result expected should be
> ["arizona"]
>Reporter: Touopi Touopi
>Priority: Major
>
> when we selecting a node of a json object that is array,
> When the array contains One element , the get_json_object return a String 
> with " characters instead of an array of One element.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31695) BigDecimal setScale is not working in Spark UDF

2020-05-12 Thread Saravanan Raju (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saravanan Raju updated SPARK-31695:
---
Description: 
I was trying to convert json column to map. I tried udf for converting json to 
map. but it is not working as expected.
  
{code:java}
val df1 = Seq(("{\"k\":10.004}")).toDF("json")
def udfJsonStrToMapDecimal = udf((jsonStr: String)=> { var 
jsonMap:Map[String,Any] = parse(jsonStr).values.asInstanceOf[Map[String, Any]]
 jsonMap.map{case(k,v) => 
(k,BigDecimal.decimal(v.asInstanceOf[Double]).setScale(6))}.toMap
})
val f = df1.withColumn("map",udfJsonStrToMapDecimal($"json"))
scala> f.printSchema
root
 |-- json: string (nullable = true)
 |-- map: map (nullable = true)
 ||-- key: string
 ||-- value: decimal(38,18) (valueContainsNull = true)
{code}
 

*instead of decimal(38,6) it converting the value as decimal(38,18)*

  was:
0
I was trying to convert json column to map. I tried udf for converting json to 
map. but it is not working as expected.
 val df1 = Seq(("\{\"k\":10.004}")).toDF("json")
def udfJsonStrToMapDecimal = udf((jsonStr: String)=> \{ var 
jsonMap:Map[String,Any] = parse(jsonStr).values.asInstanceOf[Map[String, Any]]
 jsonMap.map{case(k,v) => 
(k,BigDecimal.decimal(v.asInstanceOf[Double]).setScale(6))}.toMap
})
val f = df1.withColumn("map",udfJsonStrToMapDecimal($"json"))
scala> f.printSchema
root
 |-- json: string (nullable = true)
 |-- map: map (nullable = true)
 ||-- key: string
 ||-- value: decimal(38,18) (valueContainsNull = true){{}}
*instead of decimal(38,6) it converting the value as decimal(38,18)* 


> BigDecimal setScale is not working in Spark UDF
> ---
>
> Key: SPARK-31695
> URL: https://issues.apache.org/jira/browse/SPARK-31695
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 2.3.4
>Reporter: Saravanan Raju
>Priority: Major
>
> I was trying to convert json column to map. I tried udf for converting json 
> to map. but it is not working as expected.
>   
> {code:java}
> val df1 = Seq(("{\"k\":10.004}")).toDF("json")
> def udfJsonStrToMapDecimal = udf((jsonStr: String)=> { var 
> jsonMap:Map[String,Any] = parse(jsonStr).values.asInstanceOf[Map[String, Any]]
>  jsonMap.map{case(k,v) => 
> (k,BigDecimal.decimal(v.asInstanceOf[Double]).setScale(6))}.toMap
> })
> val f = df1.withColumn("map",udfJsonStrToMapDecimal($"json"))
> scala> f.printSchema
> root
>  |-- json: string (nullable = true)
>  |-- map: map (nullable = true)
>  ||-- key: string
>  ||-- value: decimal(38,18) (valueContainsNull = true)
> {code}
>  
> *instead of decimal(38,6) it converting the value as decimal(38,18)*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31695) BigDecimal setScale is not working in Spark UDF

2020-05-12 Thread Saravanan Raju (Jira)
Saravanan Raju created SPARK-31695:
--

 Summary: BigDecimal setScale is not working in Spark UDF
 Key: SPARK-31695
 URL: https://issues.apache.org/jira/browse/SPARK-31695
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SQL
Affects Versions: 2.3.4
Reporter: Saravanan Raju


0
I was trying to convert json column to map. I tried udf for converting json to 
map. but it is not working as expected.
 val df1 = Seq(("\{\"k\":10.004}")).toDF("json")
def udfJsonStrToMapDecimal = udf((jsonStr: String)=> \{ var 
jsonMap:Map[String,Any] = parse(jsonStr).values.asInstanceOf[Map[String, Any]]
 jsonMap.map{case(k,v) => 
(k,BigDecimal.decimal(v.asInstanceOf[Double]).setScale(6))}.toMap
})
val f = df1.withColumn("map",udfJsonStrToMapDecimal($"json"))
scala> f.printSchema
root
 |-- json: string (nullable = true)
 |-- map: map (nullable = true)
 ||-- key: string
 ||-- value: decimal(38,18) (valueContainsNull = true){{}}
*instead of decimal(38,6) it converting the value as decimal(38,18)* 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31640) Support SHOW PARTITIONS for DataSource V2 tables

2020-05-12 Thread Jackey Lee (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105885#comment-17105885
 ] 

Jackey Lee commented on SPARK-31640:


Hi [~younggyuchun] [~brkyvz], I have add this issue on Hive support for 
DataSourceV2(SPARK-31241), and has support this in our company. It may be a 
better choice to add partition catalog apis before add Partition Commands, and 
it will be quite easy to define Partition Commands with these APIs.

I have opened new issue to work on this SPARK-31694, we can complete the 
realization of the Hive Supports on DatasourceV2 together. Thanks

> Support SHOW PARTITIONS for DataSource V2 tables
> 
>
> Key: SPARK-31640
> URL: https://issues.apache.org/jira/browse/SPARK-31640
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Burak Yavuz
>Priority: Major
>
> SHOW PARTITIONS is supported for V1 Hive tables. We can also support it for 
> V2 tables, where they return the transforms and the values of those 
> transforms as separate columns.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31694) Add SupportsPartitions Catalog APIs on DataSourceV2

2020-05-12 Thread Jackey Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jackey Lee updated SPARK-31694:
---
Summary: Add SupportsPartitions Catalog APIs on DataSourceV2  (was: Add 
SupportsPartitions on DataSourceV2)

> Add SupportsPartitions Catalog APIs on DataSourceV2
> ---
>
> Key: SPARK-31694
> URL: https://issues.apache.org/jira/browse/SPARK-31694
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Jackey Lee
>Priority: Major
>
> There are no partition Commands, such as AlterTableAddPartition supported in 
> DatasourceV2, it is widely used in mysql or hive or other places. Thus it is 
> necessary to defined Partition Catalog API to support these Commands.
> We add SupportsPartitions to use basic APIs to support these commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31694) Add SupportsPartitions on DataSourceV2

2020-05-12 Thread Jackey Lee (Jira)
Jackey Lee created SPARK-31694:
--

 Summary: Add SupportsPartitions on DataSourceV2
 Key: SPARK-31694
 URL: https://issues.apache.org/jira/browse/SPARK-31694
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Jackey Lee


There are no partition Commands, such as AlterTableAddPartition supported in 
DatasourceV2, it is widely used in mysql or hive or other places. Thus it is 
necessary to defined Partition Catalog API to support these Commands.

We add SupportsPartitions to use basic APIs to support these commands.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105840#comment-17105840
 ] 

Dongjoon Hyun commented on SPARK-31693:
---

Thank you for taking a look at this~

> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105815#comment-17105815
 ] 

Shane Knapp commented on SPARK-31693:
-

weird.  nothing has changed on our end and i'm going to have to start debugging.



> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105814#comment-17105814
 ] 

Apache Spark commented on SPARK-30098:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28517

> Use default datasource as provider for CREATE TABLE syntax
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105812#comment-17105812
 ] 

Apache Spark commented on SPARK-31136:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28517

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -
>
> Key: SPARK-31136
> URL: https://issues.apache.org/jira/browse/SPARK-31136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 3
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105813#comment-17105813
 ] 

Apache Spark commented on SPARK-30098:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28517

> Use default datasource as provider for CREATE TABLE syntax
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31136) Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105811#comment-17105811
 ] 

Apache Spark commented on SPARK-31136:
--

User 'HeartSaVioR' has created a pull request for this issue:
https://github.com/apache/spark/pull/28517

> Revert SPARK-30098 Use default datasource as provider for CREATE TABLE syntax
> -
>
> Key: SPARK-31136
> URL: https://issues.apache.org/jira/browse/SPARK-31136
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> We need to consider the behavior change of SPARK-30098 .
> This is a placeholder to keep the discussion and the final decision.
> `CREATE TABLE` syntax changes its behavior silently.
> The following is one example of the breaking the existing user data pipelines.
> *Apache Spark 2.4.5*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> spark-sql> SELECT * FROM t LIMIT 1;
> # Apache Spark
> Time taken: 2.05 seconds, Fetched 1 row(s)
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 3
> {code}
> *Apache Spark 3.0.0-preview2*
> {code}
> spark-sql> CREATE TABLE t(a STRING);
> spark-sql> LOAD DATA INPATH '/usr/local/spark/README.md' INTO TABLE t;
> Error in query: LOAD DATA is not supported for datasource tables: 
> `default`.`t`;
> {code}
> {code}
> spark-sql> CREATE TABLE t(a CHAR(3));
> spark-sql> INSERT INTO TABLE t SELECT 'a ';
> spark-sql> SELECT a, length(a) FROM t;
> a 2
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31693:
--
Description: 
Given the series of failures in Spark packaging Jenkins job, it seems that 
there is a network issue in AmbLab Jenkins cluster.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

- The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
- The node failed to download the maven mirror. (SPARK-31691) -> The primary 
host is okay.
- The node failed to communicate repository.apache.org. (Current master branch 
Jenkins job failure)
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
remote metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could not 
transfer metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
 Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
failed: Connection timed out (Connection timed out) -> [Help 1]
{code}

  was:
Given the series of failures in Spark packaging Jenkins job, it seems that 
there is a network issue in AmbLab Jenkins cluster.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

- The node failed to talk to GitBox. (SPARK-31687)
- The node failed to download the maven mirror. (SPARK-31691)
- The node failed to communicate repository.apache.org. (Current master branch 
Jenkins job failure)
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
remote metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could not 
transfer metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
 Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
failed: Connection timed out (Connection timed out) -> [Help 1]
{code}


> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687) -> GitHub is okay.
> - The node failed to download the maven mirror. (SPARK-31691) -> The primary 
> host is okay.
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105807#comment-17105807
 ] 

Dongjoon Hyun commented on SPARK-31693:
---

Hi, [~shaneknapp]. How do you think the above?
cc [~srowen]

> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687)
> - The node failed to download the maven mirror. (SPARK-31691)
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31693:
--
Description: 
Given the series of failures in Spark packaging Jenkins job, it seems that 
there is a network issue in AmbLab Jenkins cluster.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

- The node failed to talk to GitBox. (SPARK-31687)
- The node failed to download the maven mirror. (SPARK-31691)
- The node failed to communicate repository.apache.org. (Current master branch 
Jenkins job failure)
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
remote metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could not 
transfer metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
 Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
failed: Connection timed out (Connection timed out) -> [Help 1]
{code}

  was:
Given the series of failures in Spark packaging Jenkins job, it seems that 
there is a network issue in AmbLab Jenkins cluster.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

- The node failed to talk to GitBox.
- The node failed to download the maven mirror.
- The node failed to communicate repository.apache.org.
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
remote metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could not 
transfer metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
 Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
failed: Connection timed out (Connection timed out) -> [Help 1]
{code}


> Investigate AmpLab Jenkins server network issue
> ---
>
> Key: SPARK-31693
> URL: https://issues.apache.org/jira/browse/SPARK-31693
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Given the series of failures in Spark packaging Jenkins job, it seems that 
> there is a network issue in AmbLab Jenkins cluster.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/
> - The node failed to talk to GitBox. (SPARK-31687)
> - The node failed to download the maven mirror. (SPARK-31691)
> - The node failed to communicate repository.apache.org. (Current master 
> branch Jenkins job failure)
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
> on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
> remote metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could 
> not transfer metadata 
> org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
> apache.snapshots.https 
> (https://repository.apache.org/content/repositories/snapshots): Transfer 
> failed for 
> https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
>  Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
> failed: Connection timed out (Connection timed out) -> [Help 1]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31693) Investigate AmpLab Jenkins server network issue

2020-05-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31693:
-

 Summary: Investigate AmpLab Jenkins server network issue
 Key: SPARK-31693
 URL: https://issues.apache.org/jira/browse/SPARK-31693
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun


Given the series of failures in Spark packaging Jenkins job, it seems that 
there is a network issue in AmbLab Jenkins cluster.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/

- The node failed to talk to GitBox.
- The node failed to download the maven mirror.
- The node failed to communicate repository.apache.org.
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-deploy-plugin:3.0.0-M1:deploy (default-deploy) 
on project spark-parent_2.12: ArtifactDeployerException: Failed to retrieve 
remote metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml: Could not 
transfer metadata 
org.apache.spark:spark-parent_2.12:3.1.0-SNAPSHOT/maven-metadata.xml from/to 
apache.snapshots.https 
(https://repository.apache.org/content/repositories/snapshots): Transfer failed 
for 
https://repository.apache.org/content/repositories/snapshots/org/apache/spark/spark-parent_2.12/3.1.0-SNAPSHOT/maven-metadata.xml:
 Connect to repository.apache.org:443 [repository.apache.org/207.244.88.140] 
failed: Connection timed out (Connection timed out) -> [Help 1]
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31692:


Assignee: (was: Apache Spark)

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Priority: Major
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31692:


Assignee: Apache Spark

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Assignee: Apache Spark
>Priority: Major
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105788#comment-17105788
 ] 

Apache Spark commented on SPARK-31692:
--

User 'karuppayya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28516

> Hadoop confs passed via spark config are not set in URLStream Handler Factory
> -
>
> Key: SPARK-31692
> URL: https://issues.apache.org/jira/browse/SPARK-31692
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Karuppayya
>Priority: Major
>
> Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
> URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31692) Hadoop confs passed via spark config are not set in URLStream Handler Factory

2020-05-12 Thread Karuppayya (Jira)
Karuppayya created SPARK-31692:
--

 Summary: Hadoop confs passed via spark config are not set in 
URLStream Handler Factory
 Key: SPARK-31692
 URL: https://issues.apache.org/jira/browse/SPARK-31692
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Karuppayya


Hadoop conf passed via spark config(as "spark.hadoop.*") are not set in 
URLStreamHandlerFactory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31691.
---
Fix Version/s: 3.0.0
   2.4.6
 Assignee: Dongjoon Hyun
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/28514

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 2.4.6, 3.0.0
>
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30654) Update Docs Bootstrap to 4.4.1

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105746#comment-17105746
 ] 

Apache Spark commented on SPARK-30654:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/28515

> Update Docs Bootstrap to 4.4.1
> --
>
> Key: SPARK-30654
> URL: https://issues.apache.org/jira/browse/SPARK-30654
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Dale Clarke
>Assignee: Dale Clarke
>Priority: Major
>
> We are using an older version of Bootstrap (v. 2.1.0) for the online 
> documentation site.  Bootstrap 2.x was moved to EOL in Aug 2013 and Bootstrap 
> 3.x was moved to EOL in July 2019 ([https://github.com/twbs/release)].  Older 
> versions of Bootstrap are also getting flagged in security scans for various 
> CVEs:
>  * [https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-72889]
>  * [https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-173700]
>  * [https://snyk.io/vuln/npm:bootstrap:20180529]
>  * [https://snyk.io/vuln/npm:bootstrap:20160627]
> I haven't validated each CVE, but it would probably be good practice to 
> resolve any potential issues and get on a supported release.
> The bad news is that there have been quite a few changes between Bootstrap 2 
> and Bootstrap 4.  I've tried updating the library, refactoring/tweaking the 
> CSS and JS to maintain a similar appearance and functionality, and testing 
> the documentation.  This is a fairly large change so I'm sure additional 
> testing and fixes will be needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30654) Update Docs Bootstrap to 4.4.1

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105744#comment-17105744
 ] 

Apache Spark commented on SPARK-30654:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/28515

> Update Docs Bootstrap to 4.4.1
> --
>
> Key: SPARK-30654
> URL: https://issues.apache.org/jira/browse/SPARK-30654
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.1.0
>Reporter: Dale Clarke
>Assignee: Dale Clarke
>Priority: Major
>
> We are using an older version of Bootstrap (v. 2.1.0) for the online 
> documentation site.  Bootstrap 2.x was moved to EOL in Aug 2013 and Bootstrap 
> 3.x was moved to EOL in July 2019 ([https://github.com/twbs/release)].  Older 
> versions of Bootstrap are also getting flagged in security scans for various 
> CVEs:
>  * [https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-72889]
>  * [https://snyk.io/vuln/SNYK-JS-BOOTSTRAP-173700]
>  * [https://snyk.io/vuln/npm:bootstrap:20180529]
>  * [https://snyk.io/vuln/npm:bootstrap:20160627]
> I haven't validated each CVE, but it would probably be good practice to 
> resolve any potential issues and get on a supported release.
> The bad news is that there have been quite a few changes between Bootstrap 2 
> and Bootstrap 4.  I've tried updating the library, refactoring/tweaking the 
> CSS and JS to maintain a similar appearance and functionality, and testing 
> the documentation.  This is a fairly large change so I'm sure additional 
> testing and fixes will be needed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31691:


Assignee: Apache Spark

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31691:


Assignee: (was: Apache Spark)

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105737#comment-17105737
 ] 

Apache Spark commented on SPARK-31691:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28514

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105736#comment-17105736
 ] 

Apache Spark commented on SPARK-31691:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28514

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31691:
--
Description: 
SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
build/mvn.
This break dev/create-release/release-build.sh

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31691:
--
Affects Version/s: 2.4.5

> release-build.sh should ignore a fallback output from `build/mvn`
> -
>
> Key: SPARK-31691
> URL: https://issues.apache.org/jira/browse/SPARK-31691
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> SPARK-28963 prints Falling back to archive.apache.org to download Maven in 
> build/mvn.
> This break dev/create-release/release-build.sh



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31691) release-build.sh should ignore a fallback output from `build/mvn`

2020-05-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31691:
-

 Summary: release-build.sh should ignore a fallback output from 
`build/mvn`
 Key: SPARK-31691
 URL: https://issues.apache.org/jira/browse/SPARK-31691
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31687) Use GitHub instead of GitBox in release script

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31687:
--
Summary: Use GitHub instead of GitBox in release script  (was: Use github 
instead of gitbox in Spark Packaging Jenkins jobs)

> Use GitHub instead of GitBox in release script
> --
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.1.0
>
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-31687.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28513
[https://github.com/apache/spark/pull/28513]

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
> Fix For: 3.1.0
>
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-31687:
-

Assignee: Dongjoon Hyun

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31690) Backport pyspark Interaction to Spark 2.4.x

2020-05-12 Thread Luca Giovagnoli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Giovagnoli updated SPARK-31690:

Priority: Minor  (was: Trivial)

> Backport pyspark Interaction to Spark 2.4.x
> ---
>
> Key: SPARK-31690
> URL: https://issues.apache.org/jira/browse/SPARK-31690
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.4.5
>Reporter: Luca Giovagnoli
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In our company, we could really make use of the Interaction pyspark wrapper 
> on spark 2.4.x.
> "Interaction" is available on spark 3.0, so I'm proposing to backport the 
> following code to the current Spark2.4.6-rc1:
> - https://issues.apache.org/jira/browse/SPARK-26970
> - [https://github.com/apache/spark/pull/24426/files]
>  
> I'm available to pick this up if it's approved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31690) Backport pyspark Interaction to Spark 2.4.x

2020-05-12 Thread Luca Giovagnoli (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Giovagnoli updated SPARK-31690:

Fix Version/s: (was: 2.4.7)
   (was: 2.4.6)

> Backport pyspark Interaction to Spark 2.4.x
> ---
>
> Key: SPARK-31690
> URL: https://issues.apache.org/jira/browse/SPARK-31690
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, PySpark
>Affects Versions: 2.4.5
>Reporter: Luca Giovagnoli
>Priority: Trivial
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In our company, we could really make use of the Interaction pyspark wrapper 
> on spark 2.4.x.
> "Interaction" is available on spark 3.0, so I'm proposing to backport the 
> following code to the current Spark2.4.6-rc1:
> - https://issues.apache.org/jira/browse/SPARK-26970
> - [https://github.com/apache/spark/pull/24426/files]
>  
> I'm available to pick this up if it's approved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31690) Backport pyspark Interaction to Spark 2.4.x

2020-05-12 Thread Luca Giovagnoli (Jira)
Luca Giovagnoli created SPARK-31690:
---

 Summary: Backport pyspark Interaction to Spark 2.4.x
 Key: SPARK-31690
 URL: https://issues.apache.org/jira/browse/SPARK-31690
 Project: Spark
  Issue Type: Improvement
  Components: ML, PySpark
Affects Versions: 2.4.5
Reporter: Luca Giovagnoli
 Fix For: 2.4.6, 2.4.7


In our company, we could really make use of the Interaction pyspark wrapper on 
spark 2.4.x.

"Interaction" is available on spark 3.0, so I'm proposing to backport the 
following code to the current Spark2.4.6-rc1:
- https://issues.apache.org/jira/browse/SPARK-26970
- [https://github.com/apache/spark/pull/24426/files]

 

I'm available to pick this up if it's approved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31687:


Assignee: Apache Spark

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105686#comment-17105686
 ] 

Apache Spark commented on SPARK-31687:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/28513

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31687:


Assignee: (was: Apache Spark)

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25557) ORC predicate pushdown for nested fields

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105684#comment-17105684
 ] 

Dongjoon Hyun commented on SPARK-25557:
---

I'm just not working on this, [~omalley]. :)

> ORC predicate pushdown for nested fields
> 
>
> Key: SPARK-25557
> URL: https://issues.apache.org/jira/browse/SPARK-25557
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27217) Nested schema pruning doesn't work for aggregation e.g. `sum`.

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27217:


Assignee: Apache Spark

> Nested schema pruning doesn't work for aggregation e.g. `sum`.
> --
>
> Key: SPARK-27217
> URL: https://issues.apache.org/jira/browse/SPARK-27217
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: colin fang
>Assignee: Apache Spark
>Priority: Major
>
> Since SPARK-4502 is fixed,  I would expect queries such as `select sum(b.x)` 
> doesn't have to read other nested fields.
> {code:python}   
>  rdd = spark.range(1000).rdd.map(lambda x: [x.id+3, [x.id+1, x.id-1]])
> df = spark.createDataFrame(add, schema='a:int,b:struct')
> df.repartition(1).write.mode('overwrite').parquet('test.parquet')
> df = spark.read.parquet('test.parquet')
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'true')
> df.select('b.x').explain()
> # ReadSchema: struct>
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'false')
> df.select('b.x').explain()
> # ReadSchema: struct>
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'true')
> df.selectExpr('sum(b.x)').explain()
> #  ReadSchema: struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27217) Nested schema pruning doesn't work for aggregation e.g. `sum`.

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-27217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-27217:


Assignee: (was: Apache Spark)

> Nested schema pruning doesn't work for aggregation e.g. `sum`.
> --
>
> Key: SPARK-27217
> URL: https://issues.apache.org/jira/browse/SPARK-27217
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: colin fang
>Priority: Major
>
> Since SPARK-4502 is fixed,  I would expect queries such as `select sum(b.x)` 
> doesn't have to read other nested fields.
> {code:python}   
>  rdd = spark.range(1000).rdd.map(lambda x: [x.id+3, [x.id+1, x.id-1]])
> df = spark.createDataFrame(add, schema='a:int,b:struct')
> df.repartition(1).write.mode('overwrite').parquet('test.parquet')
> df = spark.read.parquet('test.parquet')
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'true')
> df.select('b.x').explain()
> # ReadSchema: struct>
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'false')
> df.select('b.x').explain()
> # ReadSchema: struct>
> spark.conf.set('spark.sql.optimizer.nestedSchemaPruning.enabled', 'true')
> df.selectExpr('sum(b.x)').explain()
> #  ReadSchema: struct>
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25557) ORC predicate pushdown for nested fields

2020-05-12 Thread DB Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105680#comment-17105680
 ] 

DB Tsai edited comment on SPARK-25557 at 5/12/20, 6:57 PM:
---

[~omalley] No missing part on ORC side. We already have the foundation to 
support nested predicate pushdown in Spark in the parent JIRA, and all we need 
is hook it up with ORC like 
[https://github.com/apache/spark/pull/27728/files#diff-67a76299606811fd795f69f8d53b6f2bR56]
 for Parquet.


was (Author: dbtsai):
[~omalley] No missing part on ORC side. We already have the foundation to 
support nested predicate pushdown in Spark in the parent JIRA, and all we need 
is hook it up with ORC like 
[https://github.com/apache/spark/pull/27728/files#diff-67a76299606811fd795f69f8d53b6f2bR56]
 

> ORC predicate pushdown for nested fields
> 
>
> Key: SPARK-25557
> URL: https://issues.apache.org/jira/browse/SPARK-25557
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25557) ORC predicate pushdown for nested fields

2020-05-12 Thread DB Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105680#comment-17105680
 ] 

DB Tsai commented on SPARK-25557:
-

[~omalley] No missing part on ORC side. We already have the foundation to 
support nested predicate pushdown in Spark in the parent JIRA, and all we need 
is hook it up with ORC like 
[https://github.com/apache/spark/pull/27728/files#diff-67a76299606811fd795f69f8d53b6f2bR56]
 

> ORC predicate pushdown for nested fields
> 
>
> Key: SPARK-25557
> URL: https://issues.apache.org/jira/browse/SPARK-25557
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105658#comment-17105658
 ] 

Dongjoon Hyun commented on SPARK-31687:
---

After tracing Patrick's repo, I found that the root cause is inside Apache 
Spark git repo. Thanks, [~shaneknapp].

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105655#comment-17105655
 ] 

Dongjoon Hyun commented on SPARK-31687:
---

Oh. Got it.

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31688) Refactor pagination framework for spark web UI pages

2020-05-12 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105650#comment-17105650
 ] 

Apache Spark commented on SPARK-31688:
--

User 'iRakson' has created a pull request for this issue:
https://github.com/apache/spark/pull/28512

> Refactor pagination framework for spark web UI pages
> 
>
> Key: SPARK-31688
> URL: https://issues.apache.org/jira/browse/SPARK-31688
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Rakesh Raushan
>Priority: Minor
>
> Currently, a large chunk of code is copied when we implement pagination using 
> the current pagination framework. We also embed HTML a lot, this decreases 
> code readability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31688) Refactor pagination framework for spark web UI pages

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31688:


Assignee: Apache Spark

> Refactor pagination framework for spark web UI pages
> 
>
> Key: SPARK-31688
> URL: https://issues.apache.org/jira/browse/SPARK-31688
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Rakesh Raushan
>Assignee: Apache Spark
>Priority: Minor
>
> Currently, a large chunk of code is copied when we implement pagination using 
> the current pagination framework. We also embed HTML a lot, this decreases 
> code readability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31688) Refactor pagination framework for spark web UI pages

2020-05-12 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31688:


Assignee: (was: Apache Spark)

> Refactor pagination framework for spark web UI pages
> 
>
> Key: SPARK-31688
> URL: https://issues.apache.org/jira/browse/SPARK-31688
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 3.1.0
>Reporter: Rakesh Raushan
>Priority: Minor
>
> Currently, a large chunk of code is copied when we implement pagination using 
> the current pagination framework. We also embed HTML a lot, this decreases 
> code readability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25557) ORC predicate pushdown for nested fields

2020-05-12 Thread Owen O'Malley (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105640#comment-17105640
 ] 

Owen O'Malley commented on SPARK-25557:
---

Are there missing pieces on the ORC side, [~dongjoon]?

> ORC predicate pushdown for nested fields
> 
>
> Key: SPARK-25557
> URL: https://issues.apache.org/jira/browse/SPARK-25557
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: DB Tsai
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31689) ShuffleBlockFetchIterator keeps localBlocks in its memory even though it never uses it

2020-05-12 Thread Chandni Singh (Jira)
Chandni Singh created SPARK-31689:
-

 Summary: ShuffleBlockFetchIterator keeps localBlocks in its memory 
even though it never uses it
 Key: SPARK-31689
 URL: https://issues.apache.org/jira/browse/SPARK-31689
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 2.4.5
Reporter: Chandni Singh


The {{localBlocks}} is created and used in the {{initialize}} method of 
ShuffleBlockFetchIterator but is never used after that. 
It can be local to the initialize method instead of being a field in the 
{{ShuffleBlockFetchIterator}} instance. It hold son to memory until iterator 
instance is alive which is unnecessary.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31688) Refactor pagination framework for spark web UI pages

2020-05-12 Thread Rakesh Raushan (Jira)
Rakesh Raushan created SPARK-31688:
--

 Summary: Refactor pagination framework for spark web UI pages
 Key: SPARK-31688
 URL: https://issues.apache.org/jira/browse/SPARK-31688
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 3.1.0
Reporter: Rakesh Raushan


Currently, a large chunk of code is copied when we implement pagination using 
the current pagination framework. We also embed HTML a lot, this decreases code 
readability. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105596#comment-17105596
 ] 

Shane Knapp commented on SPARK-31687:
-

actually, it looks to be code in [~pwendell]'s github repo:

https://github.com/pwendell/spark-utils/blob/master/make_release.sh

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105592#comment-17105592
 ] 

Dongjoon Hyun commented on SPARK-31679:
---

AFAIK, both successful build and failure build have the same life time, 
[~gsomogyi]. 

Usually, the flaky test, we can search the result by the test. However, it's 
`SuiteSelector` issue in this case. :(
Failed
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it is 
a sbt.testing.SuiteSelector)

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(L

[jira] [Comment Edited] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105592#comment-17105592
 ] 

Dongjoon Hyun edited comment on SPARK-31679 at 5/12/20, 4:45 PM:
-

AFAIK, both successful build and failure build have the same life time, 
[~gsomogyi]. 

Usually, the flaky test, we can search the result by the test. However, it's 
`SuiteSelector` issue in this case. :(
{code}
Failed
 org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
is a sbt.testing.SuiteSelector)
{code}


was (Author: dongjoon):
AFAIK, both successful build and failure build have the same life time, 
[~gsomogyi]. 

Usually, the flaky test, we can search the result by the test. However, it's 
`SuiteSelector` issue in this case. :(
Failed
org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it is 
a sbt.testing.SuiteSelector)

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginCont

[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105576#comment-17105576
 ] 

Dongjoon Hyun commented on SPARK-31687:
---

The code itself seems to be AmbLab script code which outside committers cannot 
touch.

It can be some rate-limiter issue from Apache GitBox repo.

Shall we use GitHub since it's more robust and stable?

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Shane Knapp (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105570#comment-17105570
 ] 

Shane Knapp commented on SPARK-31687:
-

did something change that i was unaware of?

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-31687:
--
Description: 
Currently, Spark Packaing jobs are broken.

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console

- 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console

 - 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
{code:java}
fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
Failed to connect to gitbox.apache.org port 443: Connection timed out{code}

  was:
Currently, Spark Packaing jobs are broken.

- [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/]

 
{code:java}
fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
Failed to connect to gitbox.apache.org port 443: Connection timed out{code}


> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105566#comment-17105566
 ] 

Dongjoon Hyun commented on SPARK-31687:
---

Hi, [~shaneknapp]. Could you take a look at this please?

> Use github instead of gitbox in Spark Packaging Jenkins jobs
> 
>
> Key: SPARK-31687
> URL: https://issues.apache.org/jira/browse/SPARK-31687
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Blocker
>
> Currently, Spark Packaing jobs are broken.
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-maven-snapshots/2906/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-3.0-maven-snapshots/105/console
>  - 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-branch-2.4-maven-snapshots/439/console
> {code:java}
> fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
> Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31687) Use github instead of gitbox in Spark Packaging Jenkins jobs

2020-05-12 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-31687:
-

 Summary: Use github instead of gitbox in Spark Packaging Jenkins 
jobs
 Key: SPARK-31687
 URL: https://issues.apache.org/jira/browse/SPARK-31687
 Project: Spark
  Issue Type: Bug
  Components: Project Infra
Affects Versions: 3.0.0
Reporter: Dongjoon Hyun


Currently, Spark Packaing jobs are broken.

- [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/]

 
{code:java}
fatal: unable to access 'https://gitbox.apache.org/repos/asf/spark.git/': 
Failed to connect to gitbox.apache.org port 443: Connection timed out{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-12 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li reassigned SPARK-31387:
---

Assignee: Ali Smesseim  (was: Ali Afroozeh)

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Assignee: Ali Smesseim
>Priority: Major
> Fix For: 3.0.0
>
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31387) HiveThriftServer2Listener update methods fail with unknown operation/session id

2020-05-12 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-31387.
-
Fix Version/s: 3.0.0
 Assignee: Ali Afroozeh
   Resolution: Fixed

> HiveThriftServer2Listener update methods fail with unknown operation/session 
> id
> ---
>
> Key: SPARK-31387
> URL: https://issues.apache.org/jira/browse/SPARK-31387
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.4, 2.4.5, 3.0.0
>Reporter: Ali Smesseim
>Assignee: Ali Afroozeh
>Priority: Major
> Fix For: 3.0.0
>
>
> HiveThriftServer2Listener update methods, such as onSessionClosed and 
> onOperationError throw a NullPointerException (in Spark 3) or a 
> NoSuchElementException (in Spark 2) when the input session/operation id is 
> unknown. In Spark 2, this can cause control flow issues with the caller of 
> the listener. In Spark 3, the listener is called by a ListenerBus which 
> catches the exception, but it would still be nicer if an invalid update is 
> logged and does not throw an exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31610) Expose hashFuncVersion property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-31610:
--
Issue Type: Improvement  (was: Bug)

> Expose hashFuncVersion property in HashingTF
> 
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.0.0
>
>
> Expose hashFuncVersion property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31610) Expose hashFuncVersion property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-31610:
--
Priority: Major  (was: Critical)

> Expose hashFuncVersion property in HashingTF
> 
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.0.0
>
>
> Expose hashFuncVersion property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31610) Expose hashFuncVersion property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-31610:
--
Description: 
Expose hashFuncVersion property in HashingTF

Some third-party library such as mleap need to access it.
See background description here:
https://github.com/combust/mleap/pull/665#issuecomment-621258623


  was:
Expose hashFunc property in HashingTF

Some third-party library such as mleap need to access it.
See background description here:
https://github.com/combust/mleap/pull/665#issuecomment-621258623



> Expose hashFuncVersion property in HashingTF
> 
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Critical
> Fix For: 3.0.0
>
>
> Expose hashFuncVersion property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31610) Expose hashFunc property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng reassigned SPARK-31610:
-

Assignee: Weichen Xu

> Expose hashFunc property in HashingTF
> -
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Critical
>
> Expose hashFunc property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31668) Saving and loading HashingTF leads to hash function changed

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-31668.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28413
[https://github.com/apache/spark/pull/28413]

> Saving and loading HashingTF leads to hash function changed
> ---
>
> Key: SPARK-31668
> URL: https://issues.apache.org/jira/browse/SPARK-31668
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0, 3.0.1, 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Blocker
> Fix For: 3.0.0
>
>
> If we use spark 2.x save HashingTF, and then use spark 3.0 load it, and then 
> use spark 3.0 to save it again, and then use spark 3.0 to load it again, the 
> hash function will be changed.
> This bug is hard to debug, we need to fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31610) Expose hashFuncVersion property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-31610:
--
Summary: Expose hashFuncVersion property in HashingTF  (was: Expose 
hashFunc property in HashingTF)

> Expose hashFuncVersion property in HashingTF
> 
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Critical
> Fix For: 3.0.0
>
>
> Expose hashFunc property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31610) Expose hashFunc property in HashingTF

2020-05-12 Thread Xiangrui Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng resolved SPARK-31610.
---
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28413
[https://github.com/apache/spark/pull/28413]

> Expose hashFunc property in HashingTF
> -
>
> Key: SPARK-31610
> URL: https://issues.apache.org/jira/browse/SPARK-31610
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Critical
> Fix For: 3.0.0
>
>
> Expose hashFunc property in HashingTF
> Some third-party library such as mleap need to access it.
> See background description here:
> https://github.com/combust/mleap/pull/665#issuecomment-621258623



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31588) merge small files may need more common setting

2020-05-12 Thread philipse (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105530#comment-17105530
 ] 

philipse commented on SPARK-31588:
--

Thanks Hyukjin for your advice , i will reconsider it.
















> merge small files may need more common setting
> --
>
> Key: SPARK-31588
> URL: https://issues.apache.org/jira/browse/SPARK-31588
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5
> Environment: spark:2.4.5
> hdp:2.7
>Reporter: philipse
>Priority: Major
>
> Hi ,
> SparkSql now allow us to use  repartition or coalesce to manually control the 
> small files like the following
> /*+ REPARTITION(1) */
> /*+ COALESCE(1) */
> But it can only be  tuning case by case ,we need to decide whether we need to 
> use COALESCE or REPARTITION,can we try a more common way to reduce the 
> decision by set the target size  as hive did
> *Good points:*
> 1)we will also the new partitions number
> 2)with an ON-OFF parameter  provided , user can close it if needed
> 3)the parmeter can be set at cluster level instand of user side,it will be 
> more easier to controll samll files.
> 4)greatly reduce the pressue of namenode
>  
> *Not good points:*
> 1)It will add a new task to calculate the target numbers by stastics the out 
> files.
>  
> I don't know whether we have planned this in future.
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31686) Return of String instead of array in function get_json_object

2020-05-12 Thread Touopi Touopi (Jira)
Touopi Touopi created SPARK-31686:
-

 Summary: Return of String instead of array in function 
get_json_object
 Key: SPARK-31686
 URL: https://issues.apache.org/jira/browse/SPARK-31686
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.5
 Environment: {code:json}
// code placeholder
{
customer:{ 
 addesses:[ { {code}
                  location :  arizona

                  }

               ]

}

}

 get_json_object(string(customer),'$addresses[*].location')

return "arizona"

result expected should be

["arizona"]
Reporter: Touopi Touopi


when we selecting a node of a json object that is array,

When the array contains One element , the get_json_object return a String with 
" characters instead of an array of One element.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31680:
---

Assignee: Maxim Gekk

> Support Java 8 datetime types by Random data generator
> --
>
> Key: SPARK-31680
> URL: https://issues.apache.org/jira/browse/SPARK-31680
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> Currently, RandomDataGenerator.forType can generate:
> * java.sql.Date values for DateType
> * java.sql.Timestamp values for TimestampType
> The ticket aims to support java.time.Instant for TimestampType and 
> java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31680) Support Java 8 datetime types by Random data generator

2020-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31680.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28502
[https://github.com/apache/spark/pull/28502]

> Support Java 8 datetime types by Random data generator
> --
>
> Key: SPARK-31680
> URL: https://issues.apache.org/jira/browse/SPARK-31680
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently, RandomDataGenerator.forType can generate:
> * java.sql.Date values for DateType
> * java.sql.Timestamp values for TimestampType
> The ticket aims to support java.time.Instant for TimestampType and 
> java.time.LocalDate for DateType.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31678.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28499
[https://github.com/apache/spark/pull/28499]

> PrintStackTrace for Spark SQL CLI when error occurs
> ---
>
> Key: SPARK-31678
> URL: https://issues.apache.org/jira/browse/SPARK-31678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.0.0
>
>
> When I was finding the root cause of 
> https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
> difficult for me to see what was actually going on, since it output nothing 
> else but
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
>  expected: hdfs://cluster1
> {code}
> It is really hard for us to find causes through such a simple error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105436#comment-17105436
 ] 

Gabor Somogyi commented on SPARK-31679:
---

Ah, as I see only the successful build have artifacts for certain time. Where 
it's not really helpful :)
I think I'm stuck here because of not available logs...


> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticat

[jira] [Assigned] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs

2020-05-12 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31678:
---

Assignee: Kent Yao

> PrintStackTrace for Spark SQL CLI when error occurs
> ---
>
> Key: SPARK-31678
> URL: https://issues.apache.org/jira/browse/SPARK-31678
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.5, 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
>
> When I was finding the root cause of 
> https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very 
> difficult for me to see what was actually going on, since it output nothing 
> else but
> {code:java}
> Error in query: java.lang.IllegalArgumentException: Wrong FS: 
> hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000,
>  expected: hdfs://cluster1
> {code}
> It is really hard for us to find causes through such a simple error message.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30741) The data returned from SAS using JDBC reader contains column label

2020-05-12 Thread Gary Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105431#comment-17105431
 ] 

Gary Liu commented on SPARK-30741:
--

Can anyone look at this issue please?

> The data returned from SAS using JDBC reader contains column label
> --
>
> Key: SPARK-30741
> URL: https://issues.apache.org/jira/browse/SPARK-30741
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, PySpark
>Affects Versions: 2.1.1, 2.3.4, 2.4.5
>Reporter: Gary Liu
>Priority: Major
> Attachments: ExamplesFromSASSupport.png, ReplyFromSASSupport.png, 
> SparkBug.png
>
>
> When read SAS data using JDBC with SAS SHARE driver, the returned data 
> contains column labels, rather data. 
> According to testing result from SAS Support, the results are correct using 
> Java. So they believe it is due to spark reading. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Sean R. Owen (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105407#comment-17105407
 ] 

Sean R. Owen commented on SPARK-31679:
--

They get removed after a certain period of time; if it's not accessible any 
more I think it's just gone.

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
>   at 
> org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60)
>   at 
>

[jira] [Commented] (SPARK-31430) Bug in the approximate quantile computation.

2020-05-12 Thread Karim Magomedov (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105363#comment-17105363
 ] 

Karim Magomedov commented on SPARK-31430:
-

I'd like to work on this issue

> Bug in the approximate quantile computation.
> 
>
> Key: SPARK-31430
> URL: https://issues.apache.org/jira/browse/SPARK-31430
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Siddartha Naidu
>Priority: Major
> Attachments: approx_quantile_data.csv
>
>
> I am seeing a bug where passing lower relative error to the 
> {{approxQuantile}} function is leading to incorrect result in the presence of 
> partitions. Setting a relative error 1e-6 causes it to compute equal values 
> for 0.9 and 1.0 quantiles. Coalescing it back to 1 partition gives correct 
> results. This issue was not present in spark version 2.4.5, we noticed it 
> when testing 3.0.0-preview.
> {{>>> df = spark.read.csv('file:///tmp/approx_quantile_data.csv', 
> header=True, 
> schema=T.StructType([T.StructField('Store',T.StringType(),True),T.StructField('seconds',T.LongType(),True)]))}}
> {{>>> df = df.repartition(200, 'Store').localCheckpoint()}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.0001)}}
> {{[1422576000.0, 1430352000.0, 1438300800.0]}}
> {{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 0.1)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}
> {color:#de350b}{{>>> df.approxQuantile('seconds', [0.8, 0.9, 1.0], 
> 0.01)}}{color}
> {color:#de350b}{{[1422576000.0, 1438300800.0, 1438300800.0]}}{color}
> {{>>> df.coalesce(1).approxQuantile('seconds', [0.8, 0.9, 1.0], 0.01)}}
> {{[1422576000.0, 1430524800.0, 1438300800.0]}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-05-12 Thread Rajeev Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105309#comment-17105309
 ] 

Rajeev Kumar commented on SPARK-31685:
--

Another ticket is open for similar issue.

https://issues.apache.org/jira/browse/SPARK-26385

I had used that Jira to post comments and was asked to create new Jira.

> Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN 
> expiration issue
> ---
>
> Key: SPARK-31685
> URL: https://issues.apache.org/jira/browse/SPARK-31685
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: Rajeev Kumar
>Priority: Major
>
> I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
> streaming with Kafka. Reading the stream from Kafka and saving it to HBase.
> I get this error on the driver after 24 hours.
>  
> {code:java}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
> at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169)
> at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165)
> at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> at 
> org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171)
> at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630)
> at 
> org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
> at 
> org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
> at 
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
> at 
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$e

[jira] [Commented] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-05-12 Thread Rajeev Kumar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105307#comment-17105307
 ] 

Rajeev Kumar commented on SPARK-31685:
--

I did some testing. I found one issue in HadoopFSDelegationTokenProvider. I 
might be wrong also. Please validate below.

Spark does not renew the token rather it creates the token at the scheduled 
interval.

Spark needs two HDFS_DELEGATION_TOKEN. One for resource manager and second for 
application user.
{code:java}
val fetchCreds = fetchDelegationTokens(getTokenRenewer(hadoopConf), 
fsToGetTokens, creds) 
// Get the token renewal interval if it is not set. It will only be called 
once. 
if (tokenRenewalInterval == null) { 
 tokenRenewalInterval = getTokenRenewalInterval(hadoopConf, sparkConf, 
fsToGetTokens) 
}
{code}
At the first call to the obtainDelegationTokens it creates TWO tokens correctly.

Token for resource manager is getting created by method fetchDelegationTokens.

Token for application user is getting created inside getTokenRenewalInterval 
method.
{code:java}
private var tokenRenewalInterval: Option[Long] = null
{code}
{code:java}
 sparkConf.get(PRINCIPAL).flatMap { renewer => 
val creds = new Credentials() 
fetchDelegationTokens(renewer, filesystems, creds)
{code}
But after 18 hours or whatever the renewal period when scheduled thread of 
AMCredentialRenewer tries to create HDFS_DELEFATION_TOKEN, it creates only one 
token (for resource manager as result of call to fetchDelegationTokens method ).

But it does not create HDFS_DELEFATION_TOKEN for application user because 
tokenRenewalInterval is NOT NULL this time. Hence after expiration of 
HDFS_DELEFATION_TOKEN (typically 24 hrs) spark fails to update the spark 
checkpointing directory and job dies.

As part of my testing, I just called getTokenRenewalInterval in else block and 
job is running fine. It did not die after 24 hrs.
{code:java}
if (tokenRenewalInterval == null) {
// I put this custom log
  logInfo("Token Renewal interval is null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  tokenRenewalInterval =
getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
} else {
// I put this custom log
  logInfo("Token Renewal interval is NOT null. Calling getTokenRenewalInterval "
+ getTokenRenewer(hadoopConf))
  getTokenRenewalInterval(hadoopConf, sparkConf, fsToGetTokens)
}
{code}
Driver logs -
{code:java}
20/05/01 14:36:19 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is null. Calling getTokenRenewalInterval rm/host:port
20/05/02 08:36:42 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 02:37:00 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
20/05/03 20:37:18 INFO HadoopFSDelegationTokenProvider: Token Renewal interval 
is NOT null. Calling getTokenRenewalInterval rm/host:port
{code}

> Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN 
> expiration issue
> ---
>
> Key: SPARK-31685
> URL: https://issues.apache.org/jira/browse/SPARK-31685
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
> Environment: spark-2.4.4-bin-hadoop2.7
>Reporter: Rajeev Kumar
>Priority: Major
>
> I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
> streaming with Kafka. Reading the stream from Kafka and saving it to HBase.
> I get this error on the driver after 24 hours.
>  
> {code:java}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
>  token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
> at org.apache.hadoop.ipc.Client.call(Client.java:1475)
> at org.apache.hadoop.ipc.Client.call(Client.java:1412)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
> at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs

[jira] [Created] (SPARK-31685) Spark structured streaming with Kafka fails with HDFS_DELEGATION_TOKEN expiration issue

2020-05-12 Thread Rajeev Kumar (Jira)
Rajeev Kumar created SPARK-31685:


 Summary: Spark structured streaming with Kafka fails with 
HDFS_DELEGATION_TOKEN expiration issue
 Key: SPARK-31685
 URL: https://issues.apache.org/jira/browse/SPARK-31685
 Project: Spark
  Issue Type: Bug
  Components: Structured Streaming
Affects Versions: 2.4.4
 Environment: spark-2.4.4-bin-hadoop2.7
Reporter: Rajeev Kumar


I am facing issue for spark-2.4.4-bin-hadoop2.7. I am using spark structured 
streaming with Kafka. Reading the stream from Kafka and saving it to HBase.

I get this error on the driver after 24 hours.

 
{code:java}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken):
 token (HDFS_DELEGATION_TOKEN token 6972072 for ) is expired
at org.apache.hadoop.ipc.Client.call(Client.java:1475)
at org.apache.hadoop.ipc.Client.call(Client.java:1412)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
at sun.reflect.GeneratedMethodAccessor27.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at org.apache.hadoop.fs.Hdfs.getFileStatus(Hdfs.java:130)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1169)
at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1165)
at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1171)
at org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1630)
at 
org.apache.spark.sql.execution.streaming.FileContextBasedCheckpointFileManager.exists(CheckpointFileManager.scala:326)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.get(HDFSMetadataLog.scala:142)
at 
org.apache.spark.sql.execution.streaming.HDFSMetadataLog.add(HDFSMetadataLog.scala:110)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply$mcV$sp(MicroBatchExecution.scala:382)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1$$anonfun$apply$mcZ$sp$3.apply(MicroBatchExecution.scala:381)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply$mcZ$sp(MicroBatchExecution.scala:381)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch$1.apply(MicroBatchExecution.scala:337)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.withProgressLocked(MicroBatchExecution.scala:557)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$constructNextBatch(MicroBatchExecution.scala:337)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:183)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
at 
org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution

[jira] [Resolved] (SPARK-30828) Improve insertInto behaviour

2020-05-12 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-30828.
--
Resolution: Won't Fix

> Improve insertInto behaviour
> 
>
> Key: SPARK-30828
> URL: https://issues.apache.org/jira/browse/SPARK-30828
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.1.0
>Reporter: German Schiavon Matteo
>Assignee: Apache Spark
>Priority: Minor
>
> Actually when you call *_insertInto_* to add a dataFrame into an existing 
> table the only safety check is that the number of columns match, but the 
> order doesn't matter, and the message in case that the number of columns 
> doesn't match is not very helpful, specially when you have  a lot of columns:
> {code:java}
>  org.apache.spark.sql.AnalysisException: `default`.`table` requires that the 
> data to be inserted have the same number of columns as the target table: 
> target table has 2 column(s) but the inserted data has 1 column(s), including 
> 0 partition column(s) having constant value(s).; {code}
> I think a standard column check would be very helpful, just like in almost 
> other cases with Spark:
>  
> {code:java}
> "cannot resolve 'p2' given input columns: [id, p1];"  
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105237#comment-17105237
 ] 

Gabor Somogyi commented on SPARK-31679:
---

As I see there are no artifacts for previous jenkins executions which blocks 
further investigation.
Without logs can't really tell what has happened with KDC.

[~srowen] [~dongjoon] is it possible to reach previous jenkins execution 
artifacts somehow?


> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at 
> javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
>   at javax.security.auth.login.LoginCon

[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105236#comment-17105236
 ] 

Gabor Somogyi commented on SPARK-31679:
---

Hmmm, I've taken a look at the log and found only 2 type of kdc.conf set entry:

Initially:
{code:java}
Java config name: null
{code}

Later multiple times:
{code:java}
Java config name: 
/home/jenkins/workspace/SparkPullRequestBuilder@3/external/kafka-0-10-sql/target/tmp/org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite/spark-ee403853-36d2-4c22-a84b-4c6ae8c016b0/1588824578912/krb5.conf
{code}

This tells to me that the krb5.conf is probably not changed (at least not 
through API).


> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at javax.security.auth.login.Login

[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105214#comment-17105214
 ] 

Gabor Somogyi commented on SPARK-31679:
---

I've tried to reproduce the issue where the following exception comes 
{quote}Client not found in Kerberos database{quote} without success.
The only thing what I can imagine is the following:
* KDC started and sets it's kdc.conf
* KDC adds user
* Test starts
* Another KDC starts which overwrites kdc.conf
* Test tries to authenticate but it connects to the second KDC which doesn't 
have the user

I think it would be good to print out the actual kdc.conf when test failed. 
This way we can conclude on this possibility.

> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerberos database (6) - Client not found in Kerberos 
> database
>   at 
> com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804)
>   at 
> com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
>   at 
> javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
>   at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
>   at jav

[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient

2020-05-12 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105207#comment-17105207
 ] 

Gabor Somogyi commented on SPARK-31679:
---

I've made the intended modification in the following branch: 
https://github.com/gaborgsomogyi/spark/blob/5598ce9227a3835a3e5de7279d27d89208815c4e/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDelegationTokenSuite.scala#L53-L58

The new way is to move all the setup code into the retry area. The main problem 
is that Kafka is unable to do authentication in the second attempt:
{code:java}
20/05/11 06:54:03 INFO Selector: [AdminClient clientId=adminclient-2] Failed 
authentication with localhost/127.0.0.1 (Authentication failed during 
authentication due to invalid credentials with SASL mechanism GSSAPI)
20/05/11 06:54:03 INFO HadoopDelegationTokenManager: Attempting to login to KDC 
using principal: client/localh...@example.com
20/05/11 06:54:03 INFO HadoopDelegationTokenManager: Successfully logged into 
KDC.
20/05/11 06:54:03 ERROR NetworkClient: [AdminClient clientId=adminclient-2] 
Connection to node -1 (localhost/127.0.0.1:54492) failed authentication due to: 
Authentication failed during authentication due to invalid credentials with 
SASL mechanism GSSAPI
{code}

I think this would be the bulletproof way but somehow it's not working. JVM is 
somehow caching obtained TGTs which I've discovered here: 
https://github.com/apache/spark/pull/27877#issuecomment-597554402
The new approach may suffer from this issue.


> Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed 
> to create new KafkaAdminClient
> --
>
> Key: SPARK-31679
> URL: https://issues.apache.org/jira/browse/SPARK-31679
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming, Tests
>Affects Versions: 3.0.0
>Reporter: Gabor Somogyi
>Priority: Major
>
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/
> {code:java}
> Failed
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it 
> is a sbt.testing.SuiteSelector)
> Failing for the past 1 build (Since Failed#122389 )
> Took 34 sec.
> Error Message
> org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient
> Stacktrace
> sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to 
> create new KafkaAdminClient
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479)
>   at org.apache.kafka.clients.admin.Admin.create(Admin.java:61)
>   at 
> org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267)
>   at 
> org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290)
>   at 
> org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49)
>   at 
> org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212)
>   at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210)
>   at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208)
>   at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:296)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:286)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748)
> Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: 
> javax.security.auth.login.LoginException: Client not found in Kerberos 
> database (6) - Client not found in Kerberos database
>   at 
> org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157)
>   at 
> org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73)
>   at 
> org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105)
>   at 
> org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454)
>   ... 17 more
> Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: 
> Client not found in Kerb