[jira] [Commented] (SPARK-13359) ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6

2016-02-18 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153586#comment-15153586
 ] 

Earthson Lu commented on SPARK-13359:
-

I see:)

> ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6
> 
>
> Key: SPARK-13359
> URL: https://issues.apache.org/jira/browse/SPARK-13359
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>Priority: Minor
>
> backport fix for https://issues.apache.org/jira/browse/SPARK-12746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13359) ArrayType(_, true) should also accept ArrayType(_, false) fix for branch-1.6

2016-02-16 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-13359:
---

 Summary: ArrayType(_, true) should also accept ArrayType(_, false) 
fix for branch-1.6
 Key: SPARK-13359
 URL: https://issues.apache.org/jira/browse/SPARK-13359
 Project: Spark
  Issue Type: Bug
  Components: ML
Affects Versions: 1.6.0
Reporter: Earthson Lu
Priority: Minor
 Fix For: 1.6.1


backport fix for https://issues.apache.org/jira/browse/SPARK-12746



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122871#comment-15122871
 ] 

Earthson Lu commented on SPARK-12746:
-

Hi Joseph, what is the status of nullability now?

It seems someone has already add multi DataType check, I've merged upstream to 
use their implementation.

I was just wondering you could accept this PR?

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-12746:

Comment: was deleted

(was: Hi Joseph, what is the status of nullability now?

It seems someone has already add multi DataType check, I've merged upstream to 
use their implementation.

I was just wondering you could accept this PR?)

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-28 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122870#comment-15122870
 ] 

Earthson Lu commented on SPARK-12746:
-

Hi Joseph, what is the status of nullability now?

It seems someone has already add multi DataType check, I've merged upstream to 
use their implementation.

I was just wondering you could accept this PR?

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15096120#comment-15096120
 ] 

Earthson Lu commented on SPARK-12746:
-

I was just wandering if you could do a review:)

On Tue, Jan 12, 2016 at 10:14 AM, Apache Spark (JIRA) 




-- 

~
Perfection is achieved
not when there is nothing more to add
 but when there is nothing left to take away


> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-12746:

Comment: was deleted

(was: I was just wandering if you could do a review:)

On Tue, Jan 12, 2016 at 10:14 AM, Apache Spark (JIRA) 




-- 

~
Perfection is achieved
not when there is nothing more to add
 but when there is nothing left to take away
)

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-13 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097724#comment-15097724
 ] 

Earthson Lu commented on SPARK-12746:
-

ok, i see:)

If there's no nullability in ML, how could we implement a Transformer to fill 
missing values(always represented as NULL). I think we need support nullability 
for Preprocessing, so we can get clean data for further operation. I can't 
imagine the situation that we can do nothing when the data contains NULL.

- - -

I think the type checking API is independent with nullability in ML. It is a 
common case that one transformer accept both BooleanType or IntType. Maybe, it 
is a good idea that test condition and assertions are implemented separately.

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-11 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-12746:

Description: 
I see CountVectorizer has schema check for ArrayType which has 
ArrayType(StringType, true). 

ArrayType(String, false) is just a special case of ArrayType(String, true), but 
it will not pass this type check.

  was:
I see CountVectorizer has schema check for ArrayType which has 
ArrayType(StringType, true). 

ArrayType(String, false) is just a special case of ArrayType(String, false), 
but it will not pass this type check.


> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-11 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-12746:

Shepherd: Joseph K. Bradley  (was: Xiangrui Meng)

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, true), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-12746:
---

 Summary: ArrayType(_, true) should also accept ArrayType(_, false)
 Key: SPARK-12746
 URL: https://issues.apache.org/jira/browse/SPARK-12746
 Project: Spark
  Issue Type: Bug
  Components: ML, SQL
Affects Versions: 1.6.0
Reporter: Earthson Lu


I see CountVectorizer has schema check for ArrayType which has 
ArrayType(StringType, true). 

ArrayType(String, false) is just a special case of ArrayType(String, false), 
but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091487#comment-15091487
 ] 

Earthson Lu commented on SPARK-12746:
-

I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the case

I will choose the latter

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, false), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-12746) ArrayType(_, true) should also accept ArrayType(_, false)

2016-01-10 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15091487#comment-15091487
 ] 

Earthson Lu edited comment on SPARK-12746 at 1/11/16 6:11 AM:
--

I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the cases

I will choose the latter


was (Author: earthsonlu):
I could work on this:)

I have some idea:

1. we could implement a more powerful type check api
2. check manually for all the case

I will choose the latter

> ArrayType(_, true) should also accept ArrayType(_, false)
> -
>
> Key: SPARK-12746
> URL: https://issues.apache.org/jira/browse/SPARK-12746
> Project: Spark
>  Issue Type: Bug
>  Components: ML, SQL
>Affects Versions: 1.6.0
>Reporter: Earthson Lu
>
> I see CountVectorizer has schema check for ArrayType which has 
> ArrayType(StringType, true). 
> ArrayType(String, false) is just a special case of ArrayType(String, false), 
> but it will not pass this type check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012815#comment-15012815
 ] 

Earthson Lu commented on SPARK-6725:


I'm glad to give help:)

> Model export/import for Pipeline API
> 
>
> Key: SPARK-6725
> URL: https://issues.apache.org/jira/browse/SPARK-6725
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
>
> This is an umbrella JIRA for adding model export/import to the spark.ml API.  
> This JIRA is for adding the internal Saveable/Loadable API and Parquet-based 
> format, not for other formats like PMML.
> This will require the following steps:
> * Add export/import for all PipelineStages supported by spark.ml
> ** This will include some Transformers which are not Models.
> ** These can use almost the same format as the spark.mllib model save/load 
> functions, but the model metadata must store a different class name (marking 
> the class as a spark.ml class).
> * After all PipelineStages support save/load, add an interface which forces 
> future additions to support save/load.
> *UPDATE*: In spark.ml, we could save feature metadata using DataFrames.  
> Other libraries and formats can support this, and it would be great if we 
> could too.  We could do either of the following:
> * save() optionally takes a dataset (or schema), and load will return a 
> (model, schema) pair.
> * Models themselves save the input schema.
> Both options would mean inheriting from new Saveable, Loadable types.
> *UPDATE: DESIGN DOC*: Here's a design doc which I wrote.  If you have 
> comments about the planned implementation, please comment in this JIRA.  
> Thanks!  
> [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012815#comment-15012815
 ] 

Earthson Lu edited comment on SPARK-6725 at 11/19/15 6:34 AM:
--

-I'm glad to give some help:) Does it mean to do some unit tests?-

I'm sorry, I have to focus on my own work now, may not have time to give help 
to 1.6's release~


was (Author: earthsonlu):
I'm glad to give some help:) Does it mean to do some unit tests?

> Model export/import for Pipeline API
> 
>
> Key: SPARK-6725
> URL: https://issues.apache.org/jira/browse/SPARK-6725
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
>
> This is an umbrella JIRA for adding model export/import to the spark.ml API.  
> This JIRA is for adding the internal Saveable/Loadable API and Parquet-based 
> format, not for other formats like PMML.
> This will require the following steps:
> * Add export/import for all PipelineStages supported by spark.ml
> ** This will include some Transformers which are not Models.
> ** These can use almost the same format as the spark.mllib model save/load 
> functions, but the model metadata must store a different class name (marking 
> the class as a spark.ml class).
> * After all PipelineStages support save/load, add an interface which forces 
> future additions to support save/load.
> *UPDATE*: In spark.ml, we could save feature metadata using DataFrames.  
> Other libraries and formats can support this, and it would be great if we 
> could too.  We could do either of the following:
> * save() optionally takes a dataset (or schema), and load will return a 
> (model, schema) pair.
> * Models themselves save the input schema.
> Both options would mean inheriting from new Saveable, Loadable types.
> *UPDATE: DESIGN DOC*: Here's a design doc which I wrote.  If you have 
> comments about the planned implementation, please comment in this JIRA.  
> Thanks!  
> [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6725) Model export/import for Pipeline API

2015-11-18 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15012815#comment-15012815
 ] 

Earthson Lu edited comment on SPARK-6725 at 11/19/15 5:14 AM:
--

I'm glad to give some help:) Does it mean to do some unit tests?


was (Author: earthsonlu):
I'm glad to give help:)

> Model export/import for Pipeline API
> 
>
> Key: SPARK-6725
> URL: https://issues.apache.org/jira/browse/SPARK-6725
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
>
> This is an umbrella JIRA for adding model export/import to the spark.ml API.  
> This JIRA is for adding the internal Saveable/Loadable API and Parquet-based 
> format, not for other formats like PMML.
> This will require the following steps:
> * Add export/import for all PipelineStages supported by spark.ml
> ** This will include some Transformers which are not Models.
> ** These can use almost the same format as the spark.mllib model save/load 
> functions, but the model metadata must store a different class name (marking 
> the class as a spark.ml class).
> * After all PipelineStages support save/load, add an interface which forces 
> future additions to support save/load.
> *UPDATE*: In spark.ml, we could save feature metadata using DataFrames.  
> Other libraries and formats can support this, and it would be great if we 
> could too.  We could do either of the following:
> * save() optionally takes a dataset (or schema), and load will return a 
> (model, schema) pair.
> * Models themselves save the input schema.
> Both options would mean inheriting from new Saveable, Loadable types.
> *UPDATE: DESIGN DOC*: Here's a design doc which I wrote.  If you have 
> comments about the planned implementation, please comment in this JIRA.  
> Thanks!  
> [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-17 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15010111#comment-15010111
 ] 

Earthson Lu commented on SPARK-6727:


It’s fine:) 

I can give some help when the API is ready.

And I have a suggestion for the DefaultWritable/Readable:
/**
 * Default Writable using DefaultParamsWriter
 */
@Experimental
@Since("1.6.0")
trait DefaultWritable extends Writable {
  self: Params =>

  override def write: Writer = new DefaultParamsWriter(self)
}

/**
 * Default Readable using DefaultParamsReader
 * @tparam T ML instance type
 */
@Experimental
@Since("1.6.0")
trait DefaultReadable[T] extends Readable[T] {
  override def read: Reader[T] = new DefaultParamsReader[T]
}

I don’t know, wether this interface is complicated enough for this trait style. 


-- 
Earthson Lu

On November 17, 2015 at 10:41:11, Joseph K. Bradley (JIRA) (j...@apache.org) 
wrote:


[ 
https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15007900#comment-15007900
 ]  

Joseph K. Bradley commented on SPARK-6727:  
--  

[~EarthsonLu] Apologies for the slow response, but we've been working on adding 
this in one big batch in [SPARK-11769]. I appreciate your PR, but would you 
mind holding off on working on these export/import JIRAs for a little? We'll 
post publicly once they are ready to be taken up; the problem is that we are 
still tweaking the API a little as we fill it out. Thank you!  





--  
This message was sent by Atlassian JIRA  
(v6.3.4#6332)  


> Model export/import for spark.ml: HashingTF
> ---
>
> Key: SPARK-6727
> URL: https://issues.apache.org/jira/browse/SPARK-6727
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-11-16 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-8332:
---
Comment: was deleted

(was: SparkUI not works when upgrade fasterxml.jackson to 2.5.3)

> NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> --
>
> Key: SPARK-8332
> URL: https://issues.apache.org/jira/browse/SPARK-8332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
> Environment: spark 1.4 & hadoop 2.3.0-cdh5.0.0
>Reporter: Tao Li
>Priority: Critical
>  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson
>
> I complied new spark 1.4.0 version. 
> But when I run a simple WordCount demo, it throws NoSuchMethodError 
> {code}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> {code}
> I found out that the default "fasterxml.jackson.version" is 2.4.4. 
> Is there any wrong or conflict with the jackson version? 
> Or is there possibly some project maven dependency containing the wrong 
> version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6726) Model export/import for spark.ml: LogisticRegression

2015-11-11 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15000192#comment-15000192
 ] 

Earthson Lu commented on SPARK-6726:


Is the API ready for subtasks? I can do some work:)

> Model export/import for spark.ml: LogisticRegression
> 
>
> Key: SPARK-6726
> URL: https://issues.apache.org/jira/browse/SPARK-6726
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6725) Model export/import for Pipeline API

2015-11-11 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001578#comment-15001578
 ] 

Earthson Lu commented on SPARK-6725:


Can we expect the this api be usable in spark-1.6.0, I can help:)

> Model export/import for Pipeline API
> 
>
> Key: SPARK-6725
> URL: https://issues.apache.org/jira/browse/SPARK-6725
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
>Priority: Critical
>
> This is an umbrella JIRA for adding model export/import to the spark.ml API.  
> This JIRA is for adding the internal Saveable/Loadable API and Parquet-based 
> format, not for other formats like PMML.
> This will require the following steps:
> * Add export/import for all PipelineStages supported by spark.ml
> ** This will include some Transformers which are not Models.
> ** These can use almost the same format as the spark.mllib model save/load 
> functions, but the model metadata must store a different class name (marking 
> the class as a spark.ml class).
> * After all PipelineStages support save/load, add an interface which forces 
> future additions to support save/load.
> *UPDATE*: In spark.ml, we could save feature metadata using DataFrames.  
> Other libraries and formats can support this, and it would be great if we 
> could too.  We could do either of the following:
> * save() optionally takes a dataset (or schema), and load will return a 
> (model, schema) pair.
> * Models themselves save the input schema.
> Both options would mean inheriting from new Saveable, Loadable types.
> *UPDATE: DESIGN DOC*: Here's a design doc which I wrote.  If you have 
> comments about the planned implementation, please comment in this JIRA.  
> Thanks!  
> [https://docs.google.com/document/d/1RleM4QiKwdfZZHf0_G6FBNaF7_koc1Ui7qfMT1pf4IA/edit?usp=sharing]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-11 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001599#comment-15001599
 ] 

Earthson Lu commented on SPARK-6727:


It seems that we could implement a default reader/writer for non-model 
Transformer? Only Params need to be save/read?

> Model export/import for spark.ml: HashingTF
> ---
>
> Key: SPARK-6727
> URL: https://issues.apache.org/jira/browse/SPARK-6727
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6790) Model export/import for spark.ml: LinearRegression

2015-11-11 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001739#comment-15001739
 ] 

Earthson Lu commented on SPARK-6790:


I'm sorry, It's PR for SPARK-6727~

> Model export/import for spark.ml: LinearRegression
> --
>
> Key: SPARK-6790
> URL: https://issues.apache.org/jira/browse/SPARK-6790
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6727) Model export/import for spark.ml: HashingTF

2015-11-11 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001590#comment-15001590
 ] 

Earthson Lu commented on SPARK-6727:


Is the API ready? Can I work on this?

> Model export/import for spark.ml: HashingTF
> ---
>
> Key: SPARK-6727
> URL: https://issues.apache.org/jira/browse/SPARK-6727
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 1.3.0
>Reporter: Joseph K. Bradley
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964562#comment-14964562
 ] 

Earthson Lu commented on SPARK-8332:


SparkUI not works when upgrade fasterxml.jackson to 2.5.3

> NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> --
>
> Key: SPARK-8332
> URL: https://issues.apache.org/jira/browse/SPARK-8332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
> Environment: spark 1.4 & hadoop 2.3.0-cdh5.0.0
>Reporter: Tao Li
>Priority: Critical
>  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson
>
> I complied new spark 1.4.0 version. 
> But when I run a simple WordCount demo, it throws NoSuchMethodError 
> {code}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> {code}
> I found out that the default "fasterxml.jackson.version" is 2.4.4. 
> Is there any wrong or conflict with the jackson version? 
> Or is there possibly some project maven dependency containing the wrong 
> version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14964563#comment-14964563
 ] 

Earthson Lu commented on SPARK-8332:


SparkUI not works when upgrade fasterxml.jackson to 2.5.3

> NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> --
>
> Key: SPARK-8332
> URL: https://issues.apache.org/jira/browse/SPARK-8332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
> Environment: spark 1.4 & hadoop 2.3.0-cdh5.0.0
>Reporter: Tao Li
>Priority: Critical
>  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson
>
> I complied new spark 1.4.0 version. 
> But when I run a simple WordCount demo, it throws NoSuchMethodError 
> {code}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> {code}
> I found out that the default "fasterxml.jackson.version" is 2.4.4. 
> Is there any wrong or conflict with the jackson version? 
> Or is there possibly some project maven dependency containing the wrong 
> version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-10-19 Thread Earthson Lu (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Earthson Lu updated SPARK-8332:
---
Comment: was deleted

(was: SparkUI not works when upgrade fasterxml.jackson to 2.5.3)

> NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> --
>
> Key: SPARK-8332
> URL: https://issues.apache.org/jira/browse/SPARK-8332
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.4.0
> Environment: spark 1.4 & hadoop 2.3.0-cdh5.0.0
>Reporter: Tao Li
>Priority: Critical
>  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson
>
> I complied new spark 1.4.0 version. 
> But when I run a simple WordCount demo, it throws NoSuchMethodError 
> {code}
> java.lang.NoSuchMethodError: 
> com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
> {code}
> I found out that the default "fasterxml.jackson.version" is 2.4.4. 
> Is there any wrong or conflict with the jackson version? 
> Or is there possibly some project maven dependency containing the wrong 
> version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-07-22 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636919#comment-14636919
 ] 

Earthson Lu edited comment on SPARK-8332 at 7/22/15 1:40 PM:
-

I recompiled spark with fasterxml.jackson 2.5.3, it works with play-2.4.x

Is this ok to use 2.5.3 instead of 2.4.4?


was (Author: earthsonlu):
I recompiled spark with fasterxml.jackson 2.5.3, it works with play-2.4.x

I want to know: is this ok to use 2.5.3 instead of 2.4.4?

 NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 --

 Key: SPARK-8332
 URL: https://issues.apache.org/jira/browse/SPARK-8332
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
 Environment: spark 1.4  hadoop 2.3.0-cdh5.0.0
Reporter: Tao Li
Priority: Critical
  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson

 I complied new spark 1.4.0 version. 
 But when I run a simple WordCount demo, it throws NoSuchMethodError 
 {code}
 java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 {code}
 I found out that the default fasterxml.jackson.version is 2.4.4. 
 Is there any wrong or conflict with the jackson version? 
 Or is there possibly some project maven dependency containing the wrong 
 version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8332) NoSuchMethodError: com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer

2015-07-22 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636919#comment-14636919
 ] 

Earthson Lu commented on SPARK-8332:


I recompiled spark with fasterxml.jackson 2.5.3, it works with play-2.4.x

I want to know: is this ok to use 2.5.3 instead of 2.4.4?

 NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 --

 Key: SPARK-8332
 URL: https://issues.apache.org/jira/browse/SPARK-8332
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.4.0
 Environment: spark 1.4  hadoop 2.3.0-cdh5.0.0
Reporter: Tao Li
Priority: Critical
  Labels: 1.4.0, NoSuchMethodError, com.fasterxml.jackson

 I complied new spark 1.4.0 version. 
 But when I run a simple WordCount demo, it throws NoSuchMethodError 
 {code}
 java.lang.NoSuchMethodError: 
 com.fasterxml.jackson.module.scala.deser.BigDecimalDeserializer
 {code}
 I found out that the default fasterxml.jackson.version is 2.4.4. 
 Is there any wrong or conflict with the jackson version? 
 Or is there possibly some project maven dependency containing the wrong 
 version of jackson?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377490#comment-14377490
 ] 

Earthson Lu edited comment on SPARK-6465 at 3/25/15 5:25 AM:
-

I'm confused.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L94

{code:scala}
  def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRowWithSchema( //Why we need GenericRowWithSchema? It seems to 
be the only use of GenericRowWithSchema
  r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt = convertToScala(r_dt._1, r_dt._2)).toArray, schema)
  }
{code}


was (Author: earthsonlu):
I'm confused.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L94

```scala
  def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRowWithSchema( //Why we need GenericRowWithSchema? It seems to 
be the only use of GenericRowWithSchema
  r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt = convertToScala(r_dt._1, r_dt._2)).toArray, schema)
  }
```

 GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg 
 constructor):
 --

 Key: SPARK-6465
 URL: https://issues.apache.org/jira/browse/SPARK-6465
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: Spark 1.3, YARN 2.6.0, CentOS
Reporter: Earthson Lu
Assignee: Michael Armbrust
Priority: Critical
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 I can not find a issue for this. 
 register for GenericRowWithSchema is lost in  
 org.apache.spark.sql.execution.SparkSqlSerializer.
 Is this the only thing we need to do?
 Here is the log
 {code}
 15/03/23 16:21:00 WARN TaskSetManager: Lost task 9.0 in stage 20.0 (TID 
 31978, datanode06.site): com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
 at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
 at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138)
 at 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.sql.execution.joins.HashJoin$$anon$1.hasNext(HashJoin.scala:66)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
 at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, 

[jira] [Comment Edited] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377490#comment-14377490
 ] 

Earthson Lu edited comment on SPARK-6465 at 3/25/15 5:26 AM:
-

I'm confused.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L94

{code}
  def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRowWithSchema( //Why we need GenericRowWithSchema? It seems to 
be the only use of GenericRowWithSchema
  r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt = convertToScala(r_dt._1, r_dt._2)).toArray, schema)
  }
{code}


was (Author: earthsonlu):
I'm confused.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L94

{code:scala}
  def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRowWithSchema( //Why we need GenericRowWithSchema? It seems to 
be the only use of GenericRowWithSchema
  r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt = convertToScala(r_dt._1, r_dt._2)).toArray, schema)
  }
{code}

 GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg 
 constructor):
 --

 Key: SPARK-6465
 URL: https://issues.apache.org/jira/browse/SPARK-6465
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: Spark 1.3, YARN 2.6.0, CentOS
Reporter: Earthson Lu
Assignee: Michael Armbrust
Priority: Critical
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 I can not find a issue for this. 
 register for GenericRowWithSchema is lost in  
 org.apache.spark.sql.execution.SparkSqlSerializer.
 Is this the only thing we need to do?
 Here is the log
 {code}
 15/03/23 16:21:00 WARN TaskSetManager: Lost task 9.0 in stage 20.0 (TID 
 31978, datanode06.site): com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
 at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
 at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138)
 at 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.sql.execution.joins.HashJoin$$anon$1.hasNext(HashJoin.scala:66)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
 at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, 

[jira] [Commented] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-24 Thread Earthson Lu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377490#comment-14377490
 ] 

Earthson Lu commented on SPARK-6465:


I'm confused.

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/ScalaReflection.scala#L94

```scala
  def convertRowToScala(r: Row, schema: StructType): Row = {
// TODO: This is very slow!!!
new GenericRowWithSchema( //Why we need GenericRowWithSchema? It seems to 
be the only use of GenericRowWithSchema
  r.toSeq.zip(schema.fields.map(_.dataType))
.map(r_dt = convertToScala(r_dt._1, r_dt._2)).toArray, schema)
  }
```

 GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg 
 constructor):
 --

 Key: SPARK-6465
 URL: https://issues.apache.org/jira/browse/SPARK-6465
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.3.0
 Environment: Spark 1.3, YARN 2.6.0, CentOS
Reporter: Earthson Lu
   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 I can not find a issue for this. 
 register for GenericRowWithSchema is lost in  
 org.apache.spark.sql.execution.SparkSqlSerializer.
 Is this the only thing we need to do?
 Here is the log
 ```
 15/03/23 16:21:00 WARN TaskSetManager: Lost task 9.0 in stage 20.0 (TID 
 31978, datanode06.site): com.esotericsoftware.kryo.KryoException: Class 
 cannot be created (missing no-arg constructor): 
 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
 at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
 at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
 at 
 com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
 at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
 at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
 at 
 org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138)
 at 
 org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
 at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
 at 
 org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
 at 
 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.sql.execution.joins.HashJoin$$anon$1.hasNext(HashJoin.scala:66)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
 at 
 org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
 at 
 org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
 at 
 org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at 
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 ```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6465) GenericRowWithSchema: KryoException: Class cannot be created (missing no-arg constructor):

2015-03-23 Thread Earthson Lu (JIRA)
Earthson Lu created SPARK-6465:
--

 Summary: GenericRowWithSchema: KryoException: Class cannot be 
created (missing no-arg constructor):
 Key: SPARK-6465
 URL: https://issues.apache.org/jira/browse/SPARK-6465
 Project: Spark
  Issue Type: Bug
  Components: DataFrame
Affects Versions: 1.3.0
 Environment: Spark 1.3, YARN 2.6.0, CentOS
Reporter: Earthson Lu


I can not find a issue for this. 

register for GenericRowWithSchema is lost in  
org.apache.spark.sql.execution.SparkSqlSerializer.

Is this the only thing we need to do?

Here is the log
```
15/03/23 16:21:00 WARN TaskSetManager: Lost task 9.0 in stage 20.0 (TID 31978, 
datanode06.site): com.esotericsoftware.kryo.KryoException: Class cannot be 
created (missing no-arg constructor): 
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema
at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1050)
at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1062)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:228)
at 
com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:217)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:42)
at com.twitter.chill.Tuple2Serializer.read(TupleSerializers.scala:33)
at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:732)
at 
org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:138)
at 
org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
at 
org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.sql.execution.joins.HashJoin$$anon$1.hasNext(HashJoin.scala:66)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at 
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:217)
at 
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org