[jira] [Created] (SPARK-20332) Avro/Parquet GenericFixed decimal is not read into Spark correctly

2017-04-13 Thread Justin Pihony (JIRA)
Justin Pihony created SPARK-20332:
-

 Summary: Avro/Parquet GenericFixed decimal is not read into Spark 
correctly
 Key: SPARK-20332
 URL: https://issues.apache.org/jira/browse/SPARK-20332
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Justin Pihony
Priority: Minor


Take the following code:

spark-shell --packages org.apache.avro:avro:1.8.1
import org.apache.avro.{Conversions, LogicalTypes, Schema}
import java.math.BigDecimal
val dc = new Conversions.DecimalConversion()
val javaBD = BigDecimal.valueOf(643.85924958)
val schema = 
Schema.parse("{\"type\":\"record\",\"name\":\"Header\",\"namespace\":\"org.apache.avro.file\",\"fields\":["+
"{\"name\":\"COLUMN\",\"type\":[\"null\",{\"type\":\"fixed\",\"name\":\"COLUMN\","+
"\"size\":19,\"precision\":17,\"scale\":8,\"logicalType\":\"decimal\"}]}]}")
val schemaDec = schema.getField("COLUMN").schema()
val fieldSchema = if(schemaDec.getType() == Schema.Type.UNION)
schemaDec.getTypes.get(1) else schemaDec
val converted = dc.toFixed(javaBD, fieldSchema,
LogicalTypes.decimal(javaBD.precision, javaBD.scale))
sqlContext.createDataFrame(List(("value",converted)))

and you'll get this error:

java.lang.UnsupportedOperationException: Schema for type
org.apache.avro.generic.GenericFixed is not supported

However if you write out a parquet file using the AvroParquetWriter and the
above GenericFixed value (converted), then read it in via the
DataFrameReader the decimal value that is retrieved is not accurate (ie.
643... above is listed as -0.5...)

Even if not supported, is there any way to at least have it throw an
UnsupportedOperationException as it does when you try to do it directly (as
compared to read in from a file)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-07-07 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15366259#comment-15366259
 ] 

Justin Pihony commented on SPARK-14525:
---

[~rxin] Given the bug found in SPARK-16401, the CreatableRelationProvider is 
not necessary. However it might be nice to have now that I've already 
implemented it. I can reduce the code by removing the CreatableRelationProvider 
aspect, so I would love your feedback on my PR. 

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-06-06 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317846#comment-15317846
 ] 

Justin Pihony commented on SPARK-14525:
---

[~rxin] I have pushed my changes that now include implementing the 
CreatableRelationProvider.

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-22 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255136#comment-15255136
 ] 

Justin Pihony commented on SPARK-14525:
---

If I am to update the jdbc.DefaultSource to be a CreatableRelationProvider, 
then that also means that I have to update it to a SchemaRelationProvider. This 
would require a change to the JDBCRelation class so that it can optionally 
accept a user-specified schema. This is all possible and I see it as a change 
from either:

{code}
override val schema: StructType = JDBCRDD.resolveTable(url, table, properties)
{code}

To:

{code}
override val schema: StructType = {
  val resolvedSchema = JDBCRDD.resolveTable(url, table, properties)
  providedSchemaOption match {
case Some(providedSchema) => {
  if(providedSchema == resolvedSchema) resolvedSchema
  else sys.error("User specified schema does not match the actual schema")
}
case None => resolvedSchema
  }
}
{code}

Or, do the checking on initialization, which would not be lazy.

Thoughts/Preferences? Should I just skip making it a CreatableRelationProvider 
if none of the above work?

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-22 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15255130#comment-15255130
 ] 

Justin Pihony commented on SPARK-14525:
---

To address any concerns about taking Properties to a 
{code}Map[String,String]{code} please refer to this [StackOverflow 
question|https://stackoverflow.com/questions/873510/why-does-java-util-properties-implement-mapobject-object-and-not-mapstring-st]

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-21 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15252020#comment-15252020
 ] 

Justin Pihony commented on SPARK-14525:
---

I am actually going to work on this today. I just got busy and was waiting for 
any further comments. I have what I need now and will push a PR today :)

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-13 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238693#comment-15238693
 ] 

Justin Pihony commented on SPARK-14525:
---

I don't see why not since they're just key/values anyway. Here's the code I put 
together before realizing I didn't like the first option:

{code}
dataSource.providingClass.newInstance() match {
  case x: org.apache.spark.sql.execution.datasources.jdbc.DefaultSource  => 
{
val url = extraOptionsgetOrElse("url", 
 sys.error("Saving jdbc source requires url to be set. (ie. 
df.option(\"url\", \"ACTUAL_URL\")"))
val table = extraOptions.getOrElse("dbtable", 
extraOptions.getOrElse("table", 
 sys.error("Saving jdbc source requires dbtable to be set. 
(ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
//Rely on the impl of jdbc? which puts the user and password into the 
properties from extraOptions anyway?
jdbc(url, table, new java.util.Properties)
  }
  case _ => dataSource.write(mode, df)
}
{code}

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-13 Thread Justin Pihony (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Pihony updated SPARK-14525:
--
Comment: was deleted

(was: I don't see why not since they're just key/values anyway. Here's the code 
I put together before realizing I didn't like the first option:

{code}
dataSource.providingClass.newInstance() match {
  case x: org.apache.spark.sql.execution.datasources.jdbc.DefaultSource  => 
{
val url = extraOptionsgetOrElse("url", 
 sys.error("Saving jdbc source requires url to be set. (ie. 
df.option(\"url\", \"ACTUAL_URL\")"))
val table = extraOptions.getOrElse("dbtable", 
extraOptions.getOrElse("table", 
 sys.error("Saving jdbc source requires dbtable to be set. 
(ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
//Rely on the impl of jdbc? which puts the user and password into the 
properties from extraOptions anyway?
jdbc(url, table, new java.util.Properties)
  }
  case _ => dataSource.write(mode, df)
}
{code})

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-13 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238690#comment-15238690
 ] 

Justin Pihony commented on SPARK-14525:
---

I don't see why not since they're just key/values anyway. Here's the code I put 
together before realizing I didn't like the first option:

{code}
dataSource.providingClass.newInstance() match {
  case x: org.apache.spark.sql.execution.datasources.jdbc.DefaultSource  => 
{
val url = extraOptionsgetOrElse("url", 
 sys.error("Saving jdbc source requires url to be set. (ie. 
df.option(\"url\", \"ACTUAL_URL\")"))
val table = extraOptions.getOrElse("dbtable", 
extraOptions.getOrElse("table", 
 sys.error("Saving jdbc source requires dbtable to be set. 
(ie. df.option(\"dbtable\", \"ACTUAL_DB_TABLE\")")))
//Rely on the impl of jdbc? which puts the user and password into the 
properties from extraOptions anyway?
jdbc(url, table, new java.util.Properties)
  }
  case _ => dataSource.write(mode, df)
}
{code}

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-12 Thread Justin Pihony (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15238621#comment-15238621
 ] 

Justin Pihony commented on SPARK-14525:
---

I don't mind putting together a PR for this, however I am curious as to whether 
there is an opinion on the implementation. I see two options. Have the save 
method redirect to the jdbc method, or move the logic in the jdbc method into 
the jdbc.DefaultSource, allowing the DataFrameWriter to not have to be 
responsible; jdbc would delegate to save which would delegate to 
DataSource.write which would delegate to a new method in the jdbc.DefaultSource.

After languishing on the seemingly unclean choice having save redirect to jdbc, 
I am leaning towards the second option. I think it's a better design choice.

> DataFrameWriter's save method should delegate to jdbc for jdbc datasource
> -
>
> Key: SPARK-14525
> URL: https://issues.apache.org/jira/browse/SPARK-14525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.6.1
>Reporter: Justin Pihony
>Priority: Minor
>
> If you call {code}df.write.format("jdbc")...save(){code} then you get an 
> error  
> bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
> allow create table as select
> save is a more intuitive guess on the appropriate method to call, so the user 
> should not be punished for not knowing about the jdbc method. 
> Obviously, this will require the caller to have set up the correct parameters 
> for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-14525) DataFrameWriter's save method should delegate to jdbc for jdbc datasource

2016-04-10 Thread Justin Pihony (JIRA)
Justin Pihony created SPARK-14525:
-

 Summary: DataFrameWriter's save method should delegate to jdbc for 
jdbc datasource
 Key: SPARK-14525
 URL: https://issues.apache.org/jira/browse/SPARK-14525
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.6.1
Reporter: Justin Pihony
Priority: Minor


If you call {code}df.write.format("jdbc")...save(){code} then you get an error  
bq. org.apache.spark.sql.execution.datasources.jdbc.DefaultSource does not 
allow create table as select

save is a more intuitive guess on the appropriate method to call, so the user 
should not be punished for not knowing about the jdbc method. 

Obviously, this will require the caller to have set up the correct parameters 
for jdbc to work :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13744) Dataframe RDD caching increases the input size for subsequent stages

2016-03-08 Thread Justin Pihony (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Justin Pihony updated SPARK-13744:
--
Attachment: Screen Shot 2016-03-08 at 10.35.51 AM.png

I am using the Spark UI for the input readings. See the attached image please.

> Dataframe RDD caching increases the input size for subsequent stages
> 
>
> Key: SPARK-13744
> URL: https://issues.apache.org/jira/browse/SPARK-13744
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
> Environment: OSX
>Reporter: Justin Pihony
>Priority: Minor
> Attachments: Screen Shot 2016-03-08 at 10.35.51 AM.png
>
>
> Given the below code, you will see that the first run of count shows up as 
> ~90KB, and even the next run with cache being set will result in the same 
> input size. However, every subsequent run thereafter will result in an input 
> size that is MUCH larger (500MB is listed as 38% for a default run). This 
> size discrepancy seems to be a bug in the caching of a dataframe's RDD as far 
> as I can see. 
> {code}
> import sqlContext.implicits._
> case class Person(name:String ="Test", number:Double = 1000.2)
> val people = sc.parallelize(1 to 1000,50).map { p => Person()}.toDF
> people.write.parquet("people.parquet")
> val parquetFile = sqlContext.read.parquet("people.parquet")
> parquetFile.rdd.count()
> parquetFile.rdd.cache()
> parquetFile.rdd.count()
> parquetFile.rdd.count()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13744) Dataframe RDD caching increases the input size for subsequent stages

2016-03-08 Thread Justin Pihony (JIRA)
Justin Pihony created SPARK-13744:
-

 Summary: Dataframe RDD caching increases the input size for 
subsequent stages
 Key: SPARK-13744
 URL: https://issues.apache.org/jira/browse/SPARK-13744
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
 Environment: OSX
Reporter: Justin Pihony
Priority: Minor


Given the below code, you will see that the first run of count shows up as 
~90KB, and even the next run with cache being set will result in the same input 
size. However, every subsequent run thereafter will result in an input size 
that is MUCH larger (500MB is listed as 38% for a default run). This size 
discrepancy seems to be a bug in the caching of a dataframe's RDD as far as I 
can see. 

{code}
import sqlContext.implicits._

case class Person(name:String ="Test", number:Double = 1000.2)

val people = sc.parallelize(1 to 1000,50).map { p => Person()}.toDF

people.write.parquet("people.parquet")

val parquetFile = sqlContext.read.parquet("people.parquet")

parquetFile.rdd.count()
parquetFile.rdd.cache()
parquetFile.rdd.count()
parquetFile.rdd.count()
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13127) Upgrade Parquet to 1.9 (Fixes parquet sorting)

2016-02-01 Thread Justin Pihony (JIRA)
Justin Pihony created SPARK-13127:
-

 Summary: Upgrade Parquet to 1.9 (Fixes parquet sorting)
 Key: SPARK-13127
 URL: https://issues.apache.org/jira/browse/SPARK-13127
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.6.0
Reporter: Justin Pihony
Priority: Minor


Currently, when you write a sorted DataFrame to Parquet, then reading the data 
back out is not sorted by default. [This is due to a bug in 
Parquet|https://issues.apache.org/jira/browse/PARQUET-241] that was fixed in 
1.9.

There is a workaround to read the file back in using a file glob (filepath/*).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org