[jira] [Comment Edited] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-15 Thread Alberto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742872#comment-16742872
 ] 

Alberto edited comment on SPARK-26611 at 1/15/19 8:56 AM:
--

Tried both with a debian ) docker container working on Ubuntu 18.04 and on 
Databricks runtime 5.0.

The env is not having any effect.


was (Author: afumagalli):
Tried both with a debian docker container working on Ubuntu and on Databricks 
runtime 5.0.

The env is not having any effect.

> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and neither this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]
>  
> Using:
> spark 2.4.0
> Pandas 0.19.2
> Pyarrow 0.8.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-15 Thread Alberto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742872#comment-16742872
 ] 

Alberto commented on SPARK-26611:
-

Tried both with a debian docker container working on Ubuntu and on Databricks 
runtime 5.0.

The env is not having any effect.

> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and neither this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]
>  
> Using:
> spark 2.4.0
> Pandas 0.19.2
> Pyarrow 0.8.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742140#comment-16742140
 ] 

Alberto commented on SPARK-26611:
-

Updating pyarrow to version 0.9 seem to be soving the problem

> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and neither this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]
>  
> Using:
> spark 2.4.0
> Pandas 0.19.2
> Pyarrow 0.8.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and neither this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
df = df[df['first']=="9"]
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

 

Using:

spark 2.4.0

Pandas 0.19.2

Pyarrow 0.8.0

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and neither this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
df = df[df['first']=="9"]
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and neither this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
> See stacktrace 
> 

[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and neither this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
df = df[df['first']=="9"]
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
df = df[df['first']=="9"]
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and neither this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> 

[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
df = df[df['first']=="9"]
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and niether this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> 

[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Component/s: (was: SQL)

> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and niether this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> df = df[df['first']=="9"]
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
from pyspark.sql.functions import pandas_udf, PandasUDFType

df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and niether this:
> {code:java}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: 

[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:

 
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:

 
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and niether this:
>  
> {code:java}
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-26611:

Description: 
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]

  was:
The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:

 
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return pd.DataFrame({"first":[],"second":[]})

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]


> GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"
> ---
>
> Key: SPARK-26611
> URL: https://issues.apache.org/jira/browse/SPARK-26611
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.0
>Reporter: Alberto
>Priority: Major
>  Labels: UDF, pandas, pyspark
>
> The following snippet crashes with error: org.apache.spark.SparkException: 
> Python worker exited unexpectedly (crashed)
> {code:java}
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']=="9"]
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  while this one does not:
> {code:java}
> df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
> ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> return df[df['first']==9]
> df.groupby("second").apply(filter_pandas).count()
> {code}
> and niether this:
> {code:java}
> df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
> ("5","6")], ("first","second"))
> @pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
> def filter_pandas(df):
> if len(df)>0:
> return df
> else:
> return pd.DataFrame({"first":[],"second":[]})
> df.groupby("second").apply(filter_pandas).count()
> {code}
>  
>  
> See stacktrace 
> [here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26611) GROUPED_MAP pandas_udf crashing "Python worker exited unexpectedly"

2019-01-14 Thread Alberto (JIRA)
Alberto created SPARK-26611:
---

 Summary: GROUPED_MAP pandas_udf crashing "Python worker exited 
unexpectedly"
 Key: SPARK-26611
 URL: https://issues.apache.org/jira/browse/SPARK-26611
 Project: Spark
  Issue Type: Bug
  Components: PySpark, SQL
Affects Versions: 2.4.0
Reporter: Alberto


The following snippet crashes with error: org.apache.spark.SparkException: 
Python worker exited unexpectedly (crashed)
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 while this one does not:
{code:java}
df = spark.createDataFrame([(1,2),(2,2),(2,3), (3,4), (5,6)], 
("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):

return df[df['first']==9]

df.groupby("second").apply(filter_pandas).count()
{code}
and niether this:

 
{code:java}
df = spark.createDataFrame([("1","2"),("2","2"),("2","3"), ("3","4"), 
("5","6")], ("first","second"))

@pandas_udf("first string, second string", PandasUDFType.GROUPED_MAP)
def filter_pandas(df):
if len(df)>0:
return df
else:
return df[df['first']=="9"]

df.groupby("second").apply(filter_pandas).count()
{code}
 

 

See stacktrace 
[here|https://gist.github.com/afumagallireply/02d4c1355bc64a9d2129cdd6d0e9d9f3]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21640) Method mode with String parameters within DataFrameWriter is error prone

2017-08-04 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114317#comment-16114317
 ] 

Alberto edited comment on SPARK-21640 at 8/4/17 12:49 PM:
--

Yes, that's is [~srowen]. I don't like to use string to do such things but in 
this case I have to.

If there is an enum I think the "best worst" scenario is to use the values to 
call the method. And if the other values work I guess it should be also valid 
for the errorifexists case.


was (Author: ardlema):
Yes, that's is [~srowen]. I don't like to use string to do such things but in 
the case I have to.

If there is an enum I think the "best worst" scenario is to use the values to 
call it. And if the other values work I guess it should be also valid for the 
errorifexists case.

> Method mode with String parameters within DataFrameWriter is error prone
> 
>
> Key: SPARK-21640
> URL: https://issues.apache.org/jira/browse/SPARK-21640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Alberto
>Priority: Trivial
>
> The following method:
> {code:java}
> def mode(saveMode: String): DataFrameWriter[T]
> {code}
> sets the SaveMode of the DataFrameWriter depending on the string that is 
> pass-in as parameter.
> There is a java Enum with all the save modes which are Append, Overwrite, 
> ErrorIfExists and Ignore. In my current project I was writing some code that 
> was using this enum to get the string value that I use to call the mode 
> method:
> {code:java}
>   private[utils] val configModeAppend = SaveMode.Append.toString.toLowerCase
>   private[utils] val configModeErrorIfExists = 
> SaveMode.ErrorIfExists.toString.toLowerCase
>   private[utils] val configModeIgnore = SaveMode.Ignore.toString.toLowerCase
>   private[utils] val configModeOverwrite = 
> SaveMode.Overwrite.toString.toLowerCase
> {code}
> The configModeErrorIfExists val contains the value "errorifexists" and when I 
> call the saveMode method using this string it does not match. I suggest to 
> include "errorifexists" as a right match for the ErrorIfExists SaveMode.
> Will create a PR to address this issue ASAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21640) Method mode with String parameters within DataFrameWriter is error prone

2017-08-04 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16114317#comment-16114317
 ] 

Alberto commented on SPARK-21640:
-

Yes, that's is [~srowen]. I don't like to use string to do such things but in 
the case I have to.

If there is an enum I think the "best worst" scenario is to use the values to 
call it. And if the other values work I guess it should be also valid for the 
errorifexists case.

> Method mode with String parameters within DataFrameWriter is error prone
> 
>
> Key: SPARK-21640
> URL: https://issues.apache.org/jira/browse/SPARK-21640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.0
>Reporter: Alberto
>Priority: Trivial
>
> The following method:
> {code:java}
> def mode(saveMode: String): DataFrameWriter[T]
> {code}
> sets the SaveMode of the DataFrameWriter depending on the string that is 
> pass-in as parameter.
> There is a java Enum with all the save modes which are Append, Overwrite, 
> ErrorIfExists and Ignore. In my current project I was writing some code that 
> was using this enum to get the string value that I use to call the mode 
> method:
> {code:java}
>   private[utils] val configModeAppend = SaveMode.Append.toString.toLowerCase
>   private[utils] val configModeErrorIfExists = 
> SaveMode.ErrorIfExists.toString.toLowerCase
>   private[utils] val configModeIgnore = SaveMode.Ignore.toString.toLowerCase
>   private[utils] val configModeOverwrite = 
> SaveMode.Overwrite.toString.toLowerCase
> {code}
> The configModeErrorIfExists val contains the value "errorifexists" and when I 
> call the saveMode method using this string it does not match. I suggest to 
> include "errorifexists" as a right match for the ErrorIfExists SaveMode.
> Will create a PR to address this issue ASAP.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21640) Method mode with String parameters within DataFrameWriter is error prone

2017-08-04 Thread Alberto (JIRA)
Alberto created SPARK-21640:
---

 Summary: Method mode with String parameters within DataFrameWriter 
is error prone
 Key: SPARK-21640
 URL: https://issues.apache.org/jira/browse/SPARK-21640
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.2.0
Reporter: Alberto


The following method:
{code:java}
def mode(saveMode: String): DataFrameWriter[T]
{code}
sets the SaveMode of the DataFrameWriter depending on the string that is 
pass-in as parameter.

There is a java Enum with all the save modes which are Append, Overwrite, 
ErrorIfExists and Ignore. In my current project I was writing some code that 
was using this enum to get the string value that I use to call the mode method:


{code:java}
  private[utils] val configModeAppend = SaveMode.Append.toString.toLowerCase
  private[utils] val configModeErrorIfExists = 
SaveMode.ErrorIfExists.toString.toLowerCase
  private[utils] val configModeIgnore = SaveMode.Ignore.toString.toLowerCase
  private[utils] val configModeOverwrite = 
SaveMode.Overwrite.toString.toLowerCase
{code}

The configModeErrorIfExists val contains the value "errorifexists" and when I 
call the saveMode method using this string it does not match. I suggest to 
include "errorifexists" as a right match for the ErrorIfExists SaveMode.

Will create a PR to address this issue ASAP.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-9687) System.exit() still disrupt applications embedding Spark

2015-12-14 Thread Alberto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto closed SPARK-9687.
--

> System.exit() still disrupt applications embedding Spark
> 
>
> Key: SPARK-9687
> URL: https://issues.apache.org/jira/browse/SPARK-9687
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.4.1
>Reporter: Alberto
>Priority: Minor
>
> This issue was already reported in #SPARK-4783. It was addressed in the 
> following PR: 5492 but we are still having the same issue.
> The TaskSchedulerImpl class is now throwing a SparkException, this exception 
> is caught by the SparkUncaughtExceptionHandler which is again invoking a 
> System.exit()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9687) System.exit() still disrupt applications embedding Spark

2015-08-06 Thread Alberto (JIRA)
Alberto created SPARK-9687:
--

 Summary: System.exit() still disrupt applications embedding Spark
 Key: SPARK-9687
 URL: https://issues.apache.org/jira/browse/SPARK-9687
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.4.1
Reporter: Alberto
Priority: Minor


This issue was already reported in #SPARK-4783. It was addressed in the 
following PR: 5492 but we are still having the same issue.

The TaskSchedulerImpl class is now throwing a SparkException, this exception is 
caught by the SparkUncaughtExceptionHandler which is again invoking a 
System.exit()





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9687) System.exit() still disrupt applications embedding Spark

2015-08-06 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659860#comment-14659860
 ] 

Alberto commented on SPARK-9687:


Sorry about that, let's discuss in SPARK-4783 then

 System.exit() still disrupt applications embedding Spark
 

 Key: SPARK-9687
 URL: https://issues.apache.org/jira/browse/SPARK-9687
 Project: Spark
  Issue Type: Improvement
Affects Versions: 1.4.1
Reporter: Alberto
Priority: Minor

 This issue was already reported in #SPARK-4783. It was addressed in the 
 following PR: 5492 but we are still having the same issue.
 The TaskSchedulerImpl class is now throwing a SparkException, this exception 
 is caught by the SparkUncaughtExceptionHandler which is again invoking a 
 System.exit()



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4783) System.exit() calls in SparkContext disrupt applications embedding Spark

2015-08-06 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14659864#comment-14659864
 ] 

Alberto commented on SPARK-4783:


Still having this issue. We've found out that the exception throw by 
TaskSchedulerImpl is being caught by SparkUncaughtException which is calling 
System.exit() again.

Would it make sense just logging the error and not throwing the exception? See 
https://github.com/apache/spark/pull/7993

 System.exit() calls in SparkContext disrupt applications embedding Spark
 

 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: David Semeria
Assignee: Sean Owen
Priority: Minor
 Fix For: 1.4.0


 A common architectural choice for integrating Spark within a larger 
 application is to employ a gateway to handle Spark jobs. The gateway is a 
 server which contains one or more long-running sparkcontexts.
 A typical server is created with the following pseudo code:
 var continue = true
 while (continue){
  try {
 server.run() 
   } catch (e) {
   continue = log_and_examine_error(e)
 }
 The problem is that sparkcontext frequently calls System.exit when it 
 encounters a problem which means the server can only be re-spawned at the 
 process level, which is much more messy than the simple code above.
 Therefore, I believe it makes sense to replace all System.exit calls in 
 sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4783) System.exit() calls in SparkContext disrupt applications embedding Spark

2015-04-13 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14491993#comment-14491993
 ] 

Alberto commented on SPARK-4783:


Does it mean that you guys are going to create a PR with a fix/change proposal 
for this? Or just asking someone to create that PR? If so I am willing to 
create it.

 System.exit() calls in SparkContext disrupt applications embedding Spark
 

 Key: SPARK-4783
 URL: https://issues.apache.org/jira/browse/SPARK-4783
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: David Semeria

 A common architectural choice for integrating Spark within a larger 
 application is to employ a gateway to handle Spark jobs. The gateway is a 
 server which contains one or more long-running sparkcontexts.
 A typical server is created with the following pseudo code:
 var continue = true
 while (continue){
  try {
 server.run() 
   } catch (e) {
   continue = log_and_examine_error(e)
 }
 The problem is that sparkcontext frequently calls System.exit when it 
 encounters a problem which means the server can only be re-spawned at the 
 process level, which is much more messy than the simple code above.
 Therefore, I believe it makes sense to replace all System.exit calls in 
 sparkcontext with the throwing of a fatal error. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6804) System.exit(1) on error

2015-04-09 Thread Alberto (JIRA)
Alberto created SPARK-6804:
--

 Summary: System.exit(1) on error
 Key: SPARK-6804
 URL: https://issues.apache.org/jira/browse/SPARK-6804
 Project: Spark
  Issue Type: Improvement
Reporter: Alberto


We are developing a web application that is using Spark under the hood. Testing 
our app we have found out that when our spark master is not up and running and 
we try to connect with it, Spark is killing our app. 

We've been having a look at the code and we have noticed that the 
TaskSchedulerImpl class is just killing the JVM and our web application is 
obviously also killed. See following the code snippet I am talking about:

else {
// No task sets are active but we still got an error. Just exit since 
this
// must mean the error is during registration.
// It might be good to do something smarter here in the future.
logError(Exiting due to error from cluster scheduler:  + message)
System.exit(1)
  }

IMHO this guy should not invoke System.exit(1). Instead, it should throw an 
exception so the applications will be able to handle the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-6804) System.exit(1) on error

2015-04-09 Thread Alberto (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alberto updated SPARK-6804:
---
Description: 
We are developing a web application that is using Spark under the hood. Testing 
our app we have found out that when our spark master is not up and running and 
we try to connect with it, Spark is killing our app. 

We've been having a look at the code and we have noticed that the 
TaskSchedulerImpl class is just killing the JVM and our web application is 
obviously also killed. See following the code snippet I am talking about:

{code}
else {
// No task sets are active but we still got an error. Just exit since 
this
// must mean the error is during registration.
// It might be good to do something smarter here in the future.
logError(Exiting due to error from cluster scheduler:  + message)
System.exit(1)
  }
{code}

IMHO this guy should not invoke System.exit(1). Instead, it should throw an 
exception so the applications will be able to handle the error.

  was:
We are developing a web application that is using Spark under the hood. Testing 
our app we have found out that when our spark master is not up and running and 
we try to connect with it, Spark is killing our app. 

We've been having a look at the code and we have noticed that the 
TaskSchedulerImpl class is just killing the JVM and our web application is 
obviously also killed. See following the code snippet I am talking about:

else {
// No task sets are active but we still got an error. Just exit since 
this
// must mean the error is during registration.
// It might be good to do something smarter here in the future.
logError(Exiting due to error from cluster scheduler:  + message)
System.exit(1)
  }

IMHO this guy should not invoke System.exit(1). Instead, it should throw an 
exception so the applications will be able to handle the error.


 System.exit(1) on error
 ---

 Key: SPARK-6804
 URL: https://issues.apache.org/jira/browse/SPARK-6804
 Project: Spark
  Issue Type: Improvement
Reporter: Alberto

 We are developing a web application that is using Spark under the hood. 
 Testing our app we have found out that when our spark master is not up and 
 running and we try to connect with it, Spark is killing our app. 
 We've been having a look at the code and we have noticed that the 
 TaskSchedulerImpl class is just killing the JVM and our web application is 
 obviously also killed. See following the code snippet I am talking about:
 {code}
 else {
 // No task sets are active but we still got an error. Just exit since 
 this
 // must mean the error is during registration.
 // It might be good to do something smarter here in the future.
 logError(Exiting due to error from cluster scheduler:  + message)
 System.exit(1)
   }
 {code}
 IMHO this guy should not invoke System.exit(1). Instead, it should throw an 
 exception so the applications will be able to handle the error.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6431) Couldn't find leader offsets exception when creating KafkaDirectStream

2015-04-07 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482752#comment-14482752
 ] 

Alberto commented on SPARK-6431:


I think you're right Cody. I've been having a look at my code and I've found 
out that I'm not creating the topic before creating the DirectStream. If you 
are interested here is the test I am running: 
https://github.com/ardlema/big-brother/blob/master/src/test/scala/org/ardlema/spark/DwellDetectorTest.scala

I completely agree with you, If the topic doesn't exist it should be returning 
an error and not a misleading empty set.



 Couldn't find leader offsets exception when creating KafkaDirectStream
 --

 Key: SPARK-6431
 URL: https://issues.apache.org/jira/browse/SPARK-6431
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0
Reporter: Alberto

 When I try to create an InputDStream using the createDirectStream method of 
 the KafkaUtils class and the kafka topic does not have any messages yet am 
 getting the following error:
 org.apache.spark.SparkException: Couldn't find leader offsets for Set()
 org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't 
 find leader offsets for Set()
   at 
 org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:413)
 If I put a message in the topic before creating the DirectStream everything 
 works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6431) Couldn't find leader offsets exception when creating KafkaDirectStream

2015-04-07 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482752#comment-14482752
 ] 

Alberto edited comment on SPARK-6431 at 4/7/15 7:43 AM:


You're absolutely right Cody. I've been having a look at my code and I've found 
out that I'm not creating the topic before creating the DirectStream. If you 
are interested here is the test I am running: 
https://github.com/ardlema/big-brother/blob/master/src/test/scala/org/ardlema/spark/DwellDetectorTest.scala

I completely agree with you, If the topic doesn't exist it should be returning 
an error and not a misleading empty set.




was (Author: ardlema):
I think you're right Cody. I've been having a look at my code and I've found 
out that I'm not creating the topic before creating the DirectStream. If you 
are interested here is the test I am running: 
https://github.com/ardlema/big-brother/blob/master/src/test/scala/org/ardlema/spark/DwellDetectorTest.scala

I completely agree with you, If the topic doesn't exist it should be returning 
an error and not a misleading empty set.



 Couldn't find leader offsets exception when creating KafkaDirectStream
 --

 Key: SPARK-6431
 URL: https://issues.apache.org/jira/browse/SPARK-6431
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0
Reporter: Alberto

 When I try to create an InputDStream using the createDirectStream method of 
 the KafkaUtils class and the kafka topic does not have any messages yet am 
 getting the following error:
 org.apache.spark.SparkException: Couldn't find leader offsets for Set()
 org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't 
 find leader offsets for Set()
   at 
 org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:413)
 If I put a message in the topic before creating the DirectStream everything 
 works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-6431) Couldn't find leader offsets exception when creating KafkaDirectStream

2015-03-20 Thread Alberto (JIRA)
Alberto created SPARK-6431:
--

 Summary: Couldn't find leader offsets exception when creating 
KafkaDirectStream
 Key: SPARK-6431
 URL: https://issues.apache.org/jira/browse/SPARK-6431
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.3.0
Reporter: Alberto


When I try to create an InputDStream using the createDirectStream method of the 
KafkaUtils class and the kafka topic does not have any messages yet am getting 
the following error:

org.apache.spark.SparkException: Couldn't find leader offsets for Set()
org.apache.spark.SparkException: org.apache.spark.SparkException: Couldn't find 
leader offsets for Set()
at 
org.apache.spark.streaming.kafka.KafkaUtils$$anonfun$createDirectStream$2.apply(KafkaUtils.scala:413)

If I put a message in the topic before creating the DirectStream everything 
works fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

2015-02-24 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334942#comment-14334942
 ] 

Alberto commented on SPARK-5281:


Having this problem as well trying to migrate to 1.2.1. Any updates?

 Registering table on RDD is giving MissingRequirementError
 --

 Key: SPARK-5281
 URL: https://issues.apache.org/jira/browse/SPARK-5281
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: sarsol
Priority: Critical

 Application crashes on this line  rdd.registerTempTable(temp)  in 1.2 
 version when using sbt or Eclipse SCALA IDE
 Stacktrace 
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
 primordial classloader with boot classpath 
 [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
  Files\Java\jre7\lib\resources.jar;C:\Program 
 Files\Java\jre7\lib\rt.jar;C:\Program 
 Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
 Files\Java\jre7\lib\jsse.jar;C:\Program 
 Files\Java\jre7\lib\jce.jar;C:\Program 
 Files\Java\jre7\lib\charsets.jar;C:\Program 
 Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
   at 
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
   at 
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
   at 
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
   at 
 com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
   at 
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
   at scala.App$class.main(App.scala:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-5281) Registering table on RDD is giving MissingRequirementError

2015-02-24 Thread Alberto (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334942#comment-14334942
 ] 

Alberto edited comment on SPARK-5281 at 2/24/15 2:45 PM:
-

Having this problem as well trying to migrate from 1.1.1 to 1.2.1. Any updates?


was (Author: ardlema):
Having this problem as well trying to migrate to 1.2.1. Any updates?

 Registering table on RDD is giving MissingRequirementError
 --

 Key: SPARK-5281
 URL: https://issues.apache.org/jira/browse/SPARK-5281
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.2.0
Reporter: sarsol
Priority: Critical

 Application crashes on this line  rdd.registerTempTable(temp)  in 1.2 
 version when using sbt or Eclipse SCALA IDE
 Stacktrace 
 Exception in thread main scala.reflect.internal.MissingRequirementError: 
 class org.apache.spark.sql.catalyst.ScalaReflection in JavaMirror with 
 primordial classloader with boot classpath 
 [C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-library.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-reflect.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-actor.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-swing.jar;C:\sar\scala\scala-ide\eclipse\plugins\org.scala-ide.scala210.jars_4.0.0.201407240952\target\jars\scala-compiler.jar;C:\Program
  Files\Java\jre7\lib\resources.jar;C:\Program 
 Files\Java\jre7\lib\rt.jar;C:\Program 
 Files\Java\jre7\lib\sunrsasign.jar;C:\Program 
 Files\Java\jre7\lib\jsse.jar;C:\Program 
 Files\Java\jre7\lib\jce.jar;C:\Program 
 Files\Java\jre7\lib\charsets.jar;C:\Program 
 Files\Java\jre7\lib\jfr.jar;C:\Program Files\Java\jre7\classes] not found.
   at 
 scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
   at 
 scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
   at 
 scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:61)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticModuleOrClass(Mirrors.scala:72)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:119)
   at 
 scala.reflect.internal.Mirrors$RootsBase.staticClass(Mirrors.scala:21)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$$typecreator1$1.apply(ScalaReflection.scala:115)
   at 
 scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe$lzycompute(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$WeakTypeTagImpl.tpe(TypeTags.scala:231)
   at scala.reflect.api.TypeTags$class.typeOf(TypeTags.scala:335)
   at scala.reflect.api.Universe.typeOf(Universe.scala:59)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:115)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:100)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:33)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$class.attributesFor(ScalaReflection.scala:94)
   at 
 org.apache.spark.sql.catalyst.ScalaReflection$.attributesFor(ScalaReflection.scala:33)
   at org.apache.spark.sql.SQLContext.createSchemaRDD(SQLContext.scala:111)
   at 
 com.sar.spark.dq.poc.SparkPOC$delayedInit$body.apply(SparkPOC.scala:43)
   at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
   at 
 scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.App$$anonfun$main$1.apply(App.scala:71)
   at scala.collection.immutable.List.foreach(List.scala:318)
   at 
 scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
   at scala.App$class.main(App.scala:71)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org