date:20161229

[jira] [Updated] (SPARK-19035) rand() function in case when cause will failed

2016-12-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated SPARK-19035:
--
Description: 
*In this case:*
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

*Throw error:*
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a
select case when a=1 then 1 else rand() end b,count(1) from yuanfeng1_a group 
by case when a=1 then rand() end also output this
*Notice*:
If replace rand() as 1,it work.

  was:
*In this case:*
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

*Throw error:*
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

*Notice*:
If replace rand() as 1,it work.


> rand() function in case when cause will failed
> --
>
> Key: SPARK-19035
> URL: https://issues.apache.org/jira/browse/SPARK-19035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
>Reporter: Feng Yuan
>
> *In this case:*
>select 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end b,count(1) 
>from 
>yuanfeng1_a 
>group by 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end;
> *Throw error:*
> Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
> concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) 
> END], [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 
> as string), cast(rand(8090243936131101651) as string)) END AS b#2074]
> +- MetastoreRelation default, yuanfeng1_a
> select case when a=1 then 1 else rand() end b,count(1) from yuanfeng1_a group 
> by case when a=1 then rand() end also output this
> *Notice*:
> If replace rand() as 1,it work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19035) rand() function in case when cause will failed

2016-12-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated SPARK-19035:
--
Summary: rand() function in case when cause will failed  (was: nested 
functions in case when statement will failed)

> rand() function in case when cause will failed
> --
>
> Key: SPARK-19035
> URL: https://issues.apache.org/jira/browse/SPARK-19035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
>Reporter: Feng Yuan
>
> *In this case:*
>select 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end b,count(1) 
>from 
>yuanfeng1_a 
>group by 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end;
> *Throw error:*
> Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
> concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) 
> END], [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 
> as string), cast(rand(8090243936131101651) as string)) END AS b#2074]
> +- MetastoreRelation default, yuanfeng1_a
> *Notice*:
> If replace rand() as 1,it work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19035) nested functions in case when statement will failed

2016-12-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated SPARK-19035:
--
Description: 
In this case:
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

Throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

*Notice*:
If replace rand() as 1,it work.

  was:
In this case:
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

Throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a



> nested functions in case when statement will failed
> ---
>
> Key: SPARK-19035
> URL: https://issues.apache.org/jira/browse/SPARK-19035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
>Reporter: Feng Yuan
>
> In this case:
>select 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end b,count(1) 
>from 
>yuanfeng1_a 
>group by 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end;
> Throw error:
> Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
> concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) 
> END], [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 
> as string), cast(rand(8090243936131101651) as string)) END AS b#2074]
> +- MetastoreRelation default, yuanfeng1_a
> *Notice*:
> If replace rand() as 1,it work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19035) nested functions in case when statement will failed

2016-12-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated SPARK-19035:
--
Description: 
*In this case:*
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

*Throw error:*
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

*Notice*:
If replace rand() as 1,it work.

  was:
In this case:
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

Throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

*Notice*:
If replace rand() as 1,it work.


> nested functions in case when statement will failed
> ---
>
> Key: SPARK-19035
> URL: https://issues.apache.org/jira/browse/SPARK-19035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
>Reporter: Feng Yuan
>
> *In this case:*
>select 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end b,count(1) 
>from 
>yuanfeng1_a 
>group by 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end;
> *Throw error:*
> Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
> concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) 
> END], [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 
> as string), cast(rand(8090243936131101651) as string)) END AS b#2074]
> +- MetastoreRelation default, yuanfeng1_a
> *Notice*:
> If replace rand() as 1,it work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19035) nested functions in case when statement will failed

2016-12-29 Thread Feng Yuan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Feng Yuan updated SPARK-19035:
--
Description: 
In this case:
   select 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end b,count(1) 
   from 
   yuanfeng1_a 
   group by 
   case when a=1 then 1 else concat(a,cast(rand() as 
string)) end;

Throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a


  was:
in this case:
select case when a=1 then 1 else concat(a,cast(rand() as string)) end 
b,count(1) from yuanfeng1_a group by case when a=1 then 1 else 
concat(a,cast(rand() as string)) end;
throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

spark-sql> select case when a=1 then 1 else concat(a,cast(rand() as string)) 
end b,count(1) from yuanfeng1_a group by case when a=1 then 1 else 
concat(a,cast(rand() as string)) end;
16/12/30 15:05:55 INFO execution.SparkSqlParser: Parsing command: select case 
when a=1 then 1 else concat(a,cast(rand() as string)) end b,count(1) from 
yuanfeng1_a group by case when a=1 then 1 else concat(a,cast(rand() as string)) 
end
16/12/30 15:05:55 INFO parser.CatalystSqlParser: Parsing command: int
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2077 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2077 as string), cast(rand(-8113865568189974672) as string)) 
END], [CASE WHEN (a#2077 = 1) THEN cast(1 as string) ELSE concat(cast(a#2077 as 
string), cast(rand(-824889479508647173) as string)) END AS b#2076, count(1) AS 
count(1)#2079L]
+- MetastoreRelation default, yuanfeng1_a


> nested functions in case when statement will failed
> ---
>
> Key: SPARK-19035
> URL: https://issues.apache.org/jira/browse/SPARK-19035
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2
>Reporter: Feng Yuan
>
> In this case:
>select 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end b,count(1) 
>from 
>yuanfeng1_a 
>group by 
>case when a=1 then 1 else concat(a,cast(rand() as 
> string)) end;
> Throw error:
> Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
> by, nor is it an aggregate function. Add to group by or wrap in first() (or 
> first_value) if you don't care which value you get.;;
> Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
> concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) 
> END], [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 
> as string), cast(rand(8090243936131101651) as string)) END AS b#2074]
> +- MetastoreRelation default, yuanfeng1_a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19035) nested functions in case when statement will failed

2016-12-29 Thread Feng Yuan (JIRA)

Feng Yuan created SPARK-19035:
-

 Summary: nested functions in case when statement will failed
 Key: SPARK-19035
 URL: https://issues.apache.org/jira/browse/SPARK-19035
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.0.2, 2.0.1, 2.0.0
Reporter: Feng Yuan


in this case:
select case when a=1 then 1 else concat(a,cast(rand() as string)) end 
b,count(1) from yuanfeng1_a group by case when a=1 then 1 else 
concat(a,cast(rand() as string)) end;
throw error:
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2075 as string), cast(rand(519367429988179997) as string)) END], 
[CASE WHEN (a#2075 = 1) THEN cast(1 as string) ELSE concat(cast(a#2075 as 
string), cast(rand(8090243936131101651) as string)) END AS b#2074]
+- MetastoreRelation default, yuanfeng1_a

spark-sql> select case when a=1 then 1 else concat(a,cast(rand() as string)) 
end b,count(1) from yuanfeng1_a group by case when a=1 then 1 else 
concat(a,cast(rand() as string)) end;
16/12/30 15:05:55 INFO execution.SparkSqlParser: Parsing command: select case 
when a=1 then 1 else concat(a,cast(rand() as string)) end b,count(1) from 
yuanfeng1_a group by case when a=1 then 1 else concat(a,cast(rand() as string)) 
end
16/12/30 15:05:55 INFO parser.CatalystSqlParser: Parsing command: int
Error in query: expression 'yuanfeng1_a.`a`' is neither present in the group 
by, nor is it an aggregate function. Add to group by or wrap in first() (or 
first_value) if you don't care which value you get.;;
Aggregate [CASE WHEN (a#2077 = 1) THEN cast(1 as string) ELSE 
concat(cast(a#2077 as string), cast(rand(-8113865568189974672) as string)) 
END], [CASE WHEN (a#2077 = 1) THEN cast(1 as string) ELSE concat(cast(a#2077 as 
string), cast(rand(-824889479508647173) as string)) END AS b#2076, count(1) AS 
count(1)#2079L]
+- MetastoreRelation default, yuanfeng1_a



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19034) Download packages on 'spark.apache.org/downloads.html' contain release 2.0.2

2016-12-29 Thread Sanjay Dasgupta (JIRA)

Sanjay Dasgupta created SPARK-19034:
---

 Summary: Download packages on 'spark.apache.org/downloads.html' 
contain release 2.0.2
 Key: SPARK-19034
 URL: https://issues.apache.org/jira/browse/SPARK-19034
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.1.0
 Environment: All
Reporter: Sanjay Dasgupta


Download packages on 'https://spark.apache.org/downloads.html' have the right 
name ( spark-2.1.0-bin-...) but contain the release 2.0.2 software



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19033) HistoryServer still uses old ACLs even if ACLs are updated

2016-12-29 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15787031#comment-15787031
 ] 

Saisai Shao commented on SPARK-19033:
-

Ping [~vanzin], I found that you made this change, would you mind explaining 
the purpose of doing so? Thanks very much.

> HistoryServer still uses old ACLs even if ACLs are updated
> --
>
> Key: SPARK-19033
> URL: https://issues.apache.org/jira/browse/SPARK-19033
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Priority: Minor
>
> In the current implementation of HistoryServer, Application ACLs is picked 
> from event log rather than configuration:
> {code}
> val uiAclsEnabled = 
> conf.getBoolean("spark.history.ui.acls.enable", false)
> ui.getSecurityManager.setAcls(uiAclsEnabled)
> // make sure to set admin acls before view acls so they are 
> properly picked up
> 
> ui.getSecurityManager.setAdminAcls(appListener.adminAcls.getOrElse(""))
> ui.getSecurityManager.setViewAcls(attempt.sparkUser,
>   appListener.viewAcls.getOrElse(""))
> 
> ui.getSecurityManager.setAdminAclsGroups(appListener.adminAclsGroups.getOrElse(""))
> 
> ui.getSecurityManager.setViewAclsGroups(appListener.viewAclsGroups.getOrElse(""))
> {code}
> This will become a problem when ACLs is updated (newly added admin), only the 
> new application can be effected, the old applications were still using the 
> old ACLs. So these new admin still cannot check the logs of old applications.
> It is hard to say this is a bug, but in our scenario this is not the expected 
> behavior we wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19033) HistoryServer still uses old ACLs even if ACLs are updated

2016-12-29 Thread Saisai Shao (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Saisai Shao updated SPARK-19033:

Summary: HistoryServer still uses old ACLs even if ACLs are updated  (was: 
HistoryServer will honor old ACLs even if ACLs are updated)

> HistoryServer still uses old ACLs even if ACLs are updated
> --
>
> Key: SPARK-19033
> URL: https://issues.apache.org/jira/browse/SPARK-19033
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0
>Reporter: Saisai Shao
>Priority: Minor
>
> In the current implementation of HistoryServer, Application ACLs is picked 
> from event log rather than configuration:
> {code}
> val uiAclsEnabled = 
> conf.getBoolean("spark.history.ui.acls.enable", false)
> ui.getSecurityManager.setAcls(uiAclsEnabled)
> // make sure to set admin acls before view acls so they are 
> properly picked up
> 
> ui.getSecurityManager.setAdminAcls(appListener.adminAcls.getOrElse(""))
> ui.getSecurityManager.setViewAcls(attempt.sparkUser,
>   appListener.viewAcls.getOrElse(""))
> 
> ui.getSecurityManager.setAdminAclsGroups(appListener.adminAclsGroups.getOrElse(""))
> 
> ui.getSecurityManager.setViewAclsGroups(appListener.viewAclsGroups.getOrElse(""))
> {code}
> This will become a problem when ACLs is updated (newly added admin), only the 
> new application can be effected, the old applications were still using the 
> old ACLs. So these new admin still cannot check the logs of old applications.
> It is hard to say this is a bug, but in our scenario this is not the expected 
> behavior we wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19033) HistoryServer will honor old ACLs even if ACLs are updated

2016-12-29 Thread Saisai Shao (JIRA)

Saisai Shao created SPARK-19033:
---

 Summary: HistoryServer will honor old ACLs even if ACLs are updated
 Key: SPARK-19033
 URL: https://issues.apache.org/jira/browse/SPARK-19033
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.1.0
Reporter: Saisai Shao
Priority: Minor


In the current implementation of HistoryServer, Application ACLs is picked from 
event log rather than configuration:

{code}
val uiAclsEnabled = conf.getBoolean("spark.history.ui.acls.enable", 
false)
ui.getSecurityManager.setAcls(uiAclsEnabled)
// make sure to set admin acls before view acls so they are 
properly picked up

ui.getSecurityManager.setAdminAcls(appListener.adminAcls.getOrElse(""))
ui.getSecurityManager.setViewAcls(attempt.sparkUser,
  appListener.viewAcls.getOrElse(""))

ui.getSecurityManager.setAdminAclsGroups(appListener.adminAclsGroups.getOrElse(""))

ui.getSecurityManager.setViewAclsGroups(appListener.viewAclsGroups.getOrElse(""))
{code}

This will become a problem when ACLs is updated (newly added admin), only the 
new application can be effected, the old applications were still using the old 
ACLs. So these new admin still cannot check the logs of old applications.

It is hard to say this is a bug, but in our scenario this is not the expected 
behavior we wanted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18986) ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its iterator

2016-12-29 Thread Sameer Kumar (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786971#comment-15786971
 ] 

Sameer Kumar commented on SPARK-18986:
--

Shouldn't the priority be increased for this because I am facing this issue on 
almost every batch interval and the data doesn't get processed any further, 
which is a significant data loss for any application.

> ExternalAppendOnlyMap shouldn't fail when forced to spill before calling its 
> iterator
> -
>
> Key: SPARK-18986
> URL: https://issues.apache.org/jira/browse/SPARK-18986
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Liang-Chi Hsieh
>
> {{ExternalAppendOnlyMap.forceSpill}} now uses an assert to check if an 
> iterator is not null in the map. However, the assertion is only true after 
> the map is asked for iterator. Before it, if another memory consumer asks 
> more memory than currently available, {{ExternalAppendOnlyMap.forceSpill}} is 
> also be called too. In this case, we will see failure like this:
> {code}
> [info]   java.lang.AssertionError: assertion failed
> [info]   at scala.Predef$.assert(Predef.scala:156)
> [info]   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.forceSpill(ExternalAppendOnlyMap.scala:196)
> [info]   at 
> org.apache.spark.util.collection.Spillable.spill(Spillable.scala:111)
> [info]   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMapSuite$$anonfun$13.apply$mcV$sp(ExternalAppendOnly
> MapSuite.scala:294)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786883#comment-15786883
 ] 

Liang-Chi Hsieh edited comment on SPARK-19032 at 12/30/16 4:50 AM:
---

I think you can not guarantee the sort order per group in an aggregation under 
the current API.

One workaround is the combination of repartition + sortWithinPartitions as I 
mentioned in the discussion.

{code}
df.repartition($"account").sortWithinPartitions($"account", 
$"probability".desc).groupBy($"account").agg(first($"product"),first($"probability"))
{code}

It should work. But this is still not guaranteed by the API. If the internal 
implementation of aggregation is changed, then it can't guarantee deterministic 
results again.


was (Author: viirya):
I think you can not guarantee the sort order per group in an aggregation under 
the current API.

One workaround is the combination of repartition + sortWithinPartitions as I 
mentioned in the discussion.

df.repartition($"account").sortWithinPartitions($"account", 
$"probability".desc).groupBy($"account").agg(first($"product"),first($"probability"))

It should work. But this is still not guaranteed by the API. If the internal 
implementation of aggregation is changed, then it can't guarantee deterministic 
results again.

> Non-deterministic results using aggregation first across multiple workers
> -
>
> Key: SPARK-19032
> URL: https://issues.apache.org/jira/browse/SPARK-19032
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.1
> Environment: Standalone Spark 1.6.1 cluster on EC2 with 2 worker 
> nodes, one executor each.
>Reporter: Harry Weppner
>
> We've come across a situation results aggregated using {{first}} on a sorted 
> df are non-deterministic. Given the explanation for the plan there appears to 
> be a plausible explanation but creates more question on the usefulness of 
> these aggregation functions in a spark cluster.
> Here's a minimal example to reproduce:
> {code}
> val df = 
> sc.parallelize(Seq(("a","prod1",0.6),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4))).toDF("account","product","probability")
> var p = 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).show();
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod1| 0.6|
> +---+++
> p: Unit = ()
> // Repeat and notice that result will occasionally be different
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod2| 0.4|
> +---+++
> p: Unit = ()
> scala> 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).explain(true);
> == Parsed Logical Plan ==
> 'Aggregate ['account], 
> [unresolvedalias('account),(first('product)(),mode=Complete,isDistinct=false) 
> AS 
> first(product)()#523,(first('probability)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Analyzed Logical Plan ==
> account: string, first(product)(): string, first(probability)(): double
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Optimized Logical Plan ==
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Physical Plan ==
> SortBasedAggregate(key=[account#3], 
> functions=[(first(product#4)(),mode=Final,isDistinct=false),(first(probability#5)(),mode=Final,isDistinct=false)],
>  output=[account#3,first(product)()#523,first(probability)()#524])
> +- ConvertToSafe
>+- Sort [account#3 ASC], false, 0
>   +-

[jira] [Commented] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786883#comment-15786883
 ] 

Liang-Chi Hsieh commented on SPARK-19032:
-

I think you can not guarantee the sort order per group in an aggregation under 
the current API.

One workaround is the combination of repartition + sortWithinPartitions as I 
mentioned in the discussion.

df.repartition($"account").sortWithinPartitions($"account", 
$"probability".desc).groupBy($"account").agg(first($"product"),first($"probability"))

It should work. But this is still not guaranteed by the API. If the internal 
implementation of aggregation is changed, then it can't guarantee deterministic 
results again.

> Non-deterministic results using aggregation first across multiple workers
> -
>
> Key: SPARK-19032
> URL: https://issues.apache.org/jira/browse/SPARK-19032
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.1
> Environment: Standalone Spark 1.6.1 cluster on EC2 with 2 worker 
> nodes, one executor each.
>Reporter: Harry Weppner
>
> We've come across a situation results aggregated using {{first}} on a sorted 
> df are non-deterministic. Given the explanation for the plan there appears to 
> be a plausible explanation but creates more question on the usefulness of 
> these aggregation functions in a spark cluster.
> Here's a minimal example to reproduce:
> {code}
> val df = 
> sc.parallelize(Seq(("a","prod1",0.6),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4))).toDF("account","product","probability")
> var p = 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).show();
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod1| 0.6|
> +---+++
> p: Unit = ()
> // Repeat and notice that result will occasionally be different
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod2| 0.4|
> +---+++
> p: Unit = ()
> scala> 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).explain(true);
> == Parsed Logical Plan ==
> 'Aggregate ['account], 
> [unresolvedalias('account),(first('product)(),mode=Complete,isDistinct=false) 
> AS 
> first(product)()#523,(first('probability)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Analyzed Logical Plan ==
> account: string, first(product)(): string, first(probability)(): double
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Optimized Logical Plan ==
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Physical Plan ==
> SortBasedAggregate(key=[account#3], 
> functions=[(first(product#4)(),mode=Final,isDistinct=false),(first(probability#5)(),mode=Final,isDistinct=false)],
>  output=[account#3,first(product)()#523,first(probability)()#524])
> +- ConvertToSafe
>+- Sort [account#3 ASC], false, 0
>   +- TungstenExchange hashpartitioning(account#3,200), None
>  +- ConvertToUnsafe
> +- SortBasedAggregate(key=[account#3], 
> functions=[(first(product#4)(),mode=Partial,isDistinct=false),(first(probability#5)(),mode=Partial,isDistinct=false)],
>  output=[account#3,first#532,valueSet#533,first#534,valueSet#535])
>+- ConvertToSafe
>   +- Sort [account#3 ASC], false, 0
>  +- Sort [probability#5 DESC], true, 0
> +- ConvertToUnsafe
>+- Exchange rangepartitioning(probability#5 
> DESC,200), None
>

[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-29 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786878#comment-15786878
 ] 

Devaraj K commented on SPARK-15359:
---

Thanks [~yu2003w] for verifying this PR, I forgot to mention that it depends on 
SPARK-15288 [https://github.com/apache/spark/pull/13072] for handling the 
UncaughtException's, sorry for that. Can you verify this PR with the 
SPARK-15288 fix?

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Liang-Chi Hsieh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786824#comment-15786824
 ] 

Liang-Chi Hsieh commented on SPARK-19032:
-

There is a related discussion at dev mailing list: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Aggregating-over-sorted-data-tc1.html

> Non-deterministic results using aggregation first across multiple workers
> -
>
> Key: SPARK-19032
> URL: https://issues.apache.org/jira/browse/SPARK-19032
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 1.6.1
> Environment: Standalone Spark 1.6.1 cluster on EC2 with 2 worker 
> nodes, one executor each.
>Reporter: Harry Weppner
>
> We've come across a situation results aggregated using {{first}} on a sorted 
> df are non-deterministic. Given the explanation for the plan there appears to 
> be a plausible explanation but creates more question on the usefulness of 
> these aggregation functions in a spark cluster.
> Here's a minimal example to reproduce:
> {code}
> val df = 
> sc.parallelize(Seq(("a","prod1",0.6),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4))).toDF("account","product","probability")
> var p = 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).show();
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod1| 0.6|
> +---+++
> p: Unit = ()
> // Repeat and notice that result will occasionally be different
> +---+++
> |account|first(product)()|first(probability)()|
> +---+++
> |  a|   prod2| 0.4|
> +---+++
> p: Unit = ()
> scala> 
> df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).explain(true);
> == Parsed Logical Plan ==
> 'Aggregate ['account], 
> [unresolvedalias('account),(first('product)(),mode=Complete,isDistinct=false) 
> AS 
> first(product)()#523,(first('probability)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Analyzed Logical Plan ==
> account: string, first(product)(): string, first(probability)(): double
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Optimized Logical Plan ==
> Aggregate [account#3], 
> [account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
> first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) 
> AS first(probability)()#524]
> +- Sort [probability#5 DESC], true
>+- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
>   +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
> rddToDataFrameHolder at :27
> == Physical Plan ==
> SortBasedAggregate(key=[account#3], 
> functions=[(first(product#4)(),mode=Final,isDistinct=false),(first(probability#5)(),mode=Final,isDistinct=false)],
>  output=[account#3,first(product)()#523,first(probability)()#524])
> +- ConvertToSafe
>+- Sort [account#3 ASC], false, 0
>   +- TungstenExchange hashpartitioning(account#3,200), None
>  +- ConvertToUnsafe
> +- SortBasedAggregate(key=[account#3], 
> functions=[(first(product#4)(),mode=Partial,isDistinct=false),(first(probability#5)(),mode=Partial,isDistinct=false)],
>  output=[account#3,first#532,valueSet#533,first#534,valueSet#535])
>+- ConvertToSafe
>   +- Sort [account#3 ASC], false, 0
>  +- Sort [probability#5 DESC], true, 0
> +- ConvertToUnsafe
>+- Exchange rangepartitioning(probability#5 
> DESC,200), None
>   +- ConvertToSafe
>  +- Project [_1#0 AS account#3,_2#1 AS 
> product#4,_3#2 AS probability#5]
> +- Scan ExistingRDD[_1#0,_2#1,_3#2]
> {code}
> My working hypothesis is that after {{TungstenExchange hashpartitioning}} the 
>  _global_ sort order on {{probability}} is lost leading to

[jira] [Commented] (SPARK-18933) Different log output between Terminal screen and stderr file

2016-12-29 Thread Sean Wong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786822#comment-15786822
 ] 

Sean Wong commented on SPARK-18933:
---

But there is no stderr or stdout file available for driver logs. Only executors 
have these two files. 

> Different log output between Terminal screen and stderr file
> 
>
> Key: SPARK-18933
> URL: https://issues.apache.org/jira/browse/SPARK-18933
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Documentation, Web UI
>Affects Versions: 1.6.3
> Environment: Yarn mode and standalone mode
>Reporter: Sean Wong
>   Original Estimate: 612h
>  Remaining Estimate: 612h
>
> First of all, I use the default log4j.properties in the Spark conf/
> But I found that the log output(e.g., INFO) is different between Terminal 
> screen and stderr File. Some INFO logs exist in both of them. Some INFO logs 
> exist in either of them. Why this happens? Is it supposed that the output 
> logs are same between the terminal screen and stderr file? 
> Then I did a Test. I modified the source code in SparkContext.scala and add 
> one line log code "logInfo("This is textFile")" in the textFile function. 
> However, after running an application, I found the log "This is textFile" 
> shown in the terminal screen. no such log in the stderr file. I am not sure 
> if this is a bug. So, hope you can solve this question. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15359) Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()

2016-12-29 Thread Jared (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786813#comment-15786813
 ] 

Jared commented on SPARK-15359:
---

Hi, I tested the fix. However, it seemed the problem still existed.
I1230 11:39:07.096375  6889 sched.cpp:1223] Aborting framework
16/12/30 11:39:07 INFO MesosClusterScheduler: driver.run() returned with code 
DRIVER_ABORTED
16/12/30 11:39:07 ERROR MesosClusterScheduler: driver.run() failed
org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124)
Exception in thread "MesosClusterScheduler-mesos-driver" 
org.apache.spark.SparkException: Error starting driver, DRIVER_ABORTED
at 
org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$$anon$1.run(MesosSchedulerUtils.scala:124)
16/12/30 11:39:07 INFO Utils: Successfully started service on port 7077.
16/12/30 11:39:07 INFO MesosRestServer: Started REST server for submitting 
applications on port 7077

It seemed that exceptions thrown was not handled.
I think several other files should also be changed to fix this problem.

> Mesos dispatcher should handle DRIVER_ABORTED status from mesosDriver.run()
> ---
>
> Key: SPARK-15359
> URL: https://issues.apache.org/jira/browse/SPARK-15359
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy, Mesos
>Reporter: Devaraj K
>Priority: Minor
>
> Mesos dispatcher handles DRIVER_ABORTED status for mesosDriver.run() during 
> the successful registration but if the mesosDriver.run() returns 
> DRIVER_ABORTED status after the successful register then there is no action 
> for the status and the thread will be terminated. 
> I think we need to throw the exception and shutdown the dispatcher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786773#comment-15786773
 ] 

Apache Spark commented on SPARK-19026:
--

User 'zuotingbing' has created a pull request for this issue:
https://github.com/apache/spark/pull/16439

> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19026:


Assignee: (was: Apache Spark)

> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19026:


Assignee: Apache Spark

> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>Assignee: Apache Spark
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-29 Thread Adam Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Wang updated SPARK-18974:
--
Description: FileInputDStream use mod time to find new files, but if a file 
was moved into the directories it's modification time would not be changed, so 
FileInputDStream could not detect these files.  (was: FileInputDStream use mod 
time to find new files, but if a file was moved into the directories it's 
modification time would not be changed, so FileInputDStream could not detect 
these files.

I think a way to fix this bug is get access_time and do judgment, bug it need a 
Set of files to save all old files, it would very inefficient for lot of files 
directory.)

> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-29 Thread Adam Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786735#comment-15786735
 ] 

Adam Wang commented on SPARK-18974:
---

Thanks for reminding, I haven't tried before, I will try later

> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12757) Use reference counting to prevent blocks from being evicted during reads

2016-12-29 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-12757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786726#comment-15786726
 ] 

Felix Cheung commented on SPARK-12757:
--

ping. Still seeing a lot of these messages on Spark 2.1. Is that a new issue?


> Use reference counting to prevent blocks from being evicted during reads
> 
>
> Key: SPARK-12757
> URL: https://issues.apache.org/jira/browse/SPARK-12757
> Project: Spark
>  Issue Type: Improvement
>  Components: Block Manager
>Reporter: Josh Rosen
>Assignee: Josh Rosen
> Fix For: 2.0.0
>
>
> As a pre-requisite to off-heap caching of blocks, we need a mechanism to 
> prevent pages / blocks from being evicted while they are being read. With 
> on-heap objects, evicting a block while it is being read merely leads to 
> memory-accounting problems (because we assume that an evicted block is a 
> candidate for garbage-collection, which will not be true during a read), but 
> with off-heap memory this will lead to either data corruption or segmentation 
> faults.
> To address this, we should add a reference-counting mechanism to track which 
> blocks/pages are being read in order to prevent them from being evicted 
> prematurely. I propose to do this in two phases: first, add a safe, 
> conservative approach in which all BlockManager.get*() calls implicitly 
> increment the reference count of blocks and where tasks' references are 
> automatically freed upon task completion. This will be correct but may have 
> adverse performance impacts because it will prevent legitimate block 
> evictions. In phase two, we should incrementally add release() calls in order 
> to fix the eviction of unreferenced blocks. The latter change may need to 
> touch many different components, which is why I propose to do it separately 
> in order to make the changes easier to reason about and review.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-17766) Write ahead log exception on a toy project

2016-12-29 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-17766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-17766.
--
Resolution: Duplicate

This has been fixed in SPARK-18617

> Write ahead log exception on a toy project
> --
>
> Key: SPARK-17766
> URL: https://issues.apache.org/jira/browse/SPARK-17766
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.0
>Reporter: Nadav Samet
>Priority: Minor
>
> Write ahead log seems to get corrupted when the application is stopped 
> abruptly (Ctrl-C, or kill). Then, the application refuses to run due to this 
> exception:
> {code}
> 2016-10-03 08:03:32,321 ERROR [Executor task launch worker-1] 
> executor.Executor: Exception in task 0.0 in stage 1.0 (TID 1)
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> ...skipping...
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:192)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79)
> at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47)
> at org.apache.spark.scheduler.Task.run(Task.scala:85)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Code:
> {code}
> import org.apache.hadoop.conf.Configuration
> import org.apache.spark._
> import org.apache.spark.streaming._
> object ProtoDemo {
>   def createContext(dirName: String) = {
> val conf = new SparkConf().setAppName("mything").setMaster("local[4]")
> conf.set("spark.streaming.receiver.writeAheadLog.enable", "true")
> /*
> conf.set("spark.streaming.driver.writeAheadLog.closeFileAfterWrite", 
> "true")
> conf.set("spark.streaming.receiver.writeAheadLog.closeFileAfterWrite", 
> "true")
> */
> val ssc = new StreamingContext(conf, Seconds(1))
> ssc.checkpoint(dirName)
> val lines = ssc.socketTextStream("127.0.0.1", )
> val words = lines.flatMap(_.split(" "))
> val pairs = words.map(word => (word, 1))
> val wordCounts = pairs.reduceByKey(_ + _)
> val runningCounts = wordCounts.updateStateByKey[Int] {
>   (values: Seq[Int], oldValue: Option[Int]) =>
> val s = values.sum
> Some(oldValue.fold(s)(_ + s))
>   }
>   // Print the first ten elements of each RDD generated in this DStream to 
> the console
> runningCounts.print()
> ssc
>   }
>   def main(args: Array[String]) = {
> val hadoopConf = new Configuration()
> val dirName = "/tmp/chkp"
> val ssc = StreamingContext.getOrCreate(dirName, () => 
> createContext(dirName), hadoopConf)
> ssc.start()
> ssc.awaitTermination()
>   }
> }
> {code}
> Steps to reproduce:
> 1. I put the code in a repository: git clone 
> https://github.com/thesamet/spark-issue
> 2. in one terminal: {{ while true; do nc -l localhost ; done}}
> 3. Start a new terminal
> 4. Run "sbt run".
> 5. Type a few lines in the netcat terminal.
> 6. Kill the streaming project (Ctrl-C), 
> 7. Go back to step 4 until you see the exception above.
> I tried the above with local filesystem and also with S3, and getting the 
> same result.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19032) Non-deterministic results using aggregation first across multiple workers

2016-12-29 Thread Harry Weppner (JIRA)

Harry Weppner created SPARK-19032:
-

 Summary: Non-deterministic results using aggregation first across 
multiple workers
 Key: SPARK-19032
 URL: https://issues.apache.org/jira/browse/SPARK-19032
 Project: Spark
  Issue Type: Bug
  Components: Optimizer
Affects Versions: 1.6.1
 Environment: Standalone Spark 1.6.1 cluster on EC2 with 2 worker 
nodes, one executor each.
Reporter: Harry Weppner


We've come across a situation results aggregated using {{first}} on a sorted df 
are non-deterministic. Given the explanation for the plan there appears to be a 
plausible explanation but creates more question on the usefulness of these 
aggregation functions in a spark cluster.

Here's a minimal example to reproduce:

{code}
val df = 
sc.parallelize(Seq(("a","prod1",0.6),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4),("a","prod2",0.4))).toDF("account","product","probability")
var p = 
df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).show();

+---+++
|account|first(product)()|first(probability)()|
+---+++
|  a|   prod1| 0.6|
+---+++

p: Unit = ()

// Repeat and notice that result will occasionally be different

+---+++
|account|first(product)()|first(probability)()|
+---+++
|  a|   prod2| 0.4|
+---+++

p: Unit = ()

scala> 
df.sort($"probability".desc).groupBy($"account").agg(first($"product"),first($"probability")).explain(true);
== Parsed Logical Plan ==
'Aggregate ['account], 
[unresolvedalias('account),(first('product)(),mode=Complete,isDistinct=false) 
AS first(product)()#523,(first('probability)(),mode=Complete,isDistinct=false) 
AS first(probability)()#524]
+- Sort [probability#5 DESC], true
   +- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
  +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :27

== Analyzed Logical Plan ==
account: string, first(product)(): string, first(probability)(): double
Aggregate [account#3], 
[account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) AS 
first(probability)()#524]
+- Sort [probability#5 DESC], true
   +- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
  +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :27

== Optimized Logical Plan ==
Aggregate [account#3], 
[account#3,(first(product#4)(),mode=Complete,isDistinct=false) AS 
first(product)()#523,(first(probability#5)(),mode=Complete,isDistinct=false) AS 
first(probability)()#524]
+- Sort [probability#5 DESC], true
   +- Project [_1#0 AS account#3,_2#1 AS product#4,_3#2 AS probability#5]
  +- LogicalRDD [_1#0,_2#1,_3#2], MapPartitionsRDD[1] at 
rddToDataFrameHolder at :27

== Physical Plan ==
SortBasedAggregate(key=[account#3], 
functions=[(first(product#4)(),mode=Final,isDistinct=false),(first(probability#5)(),mode=Final,isDistinct=false)],
 output=[account#3,first(product)()#523,first(probability)()#524])
+- ConvertToSafe
   +- Sort [account#3 ASC], false, 0
  +- TungstenExchange hashpartitioning(account#3,200), None
 +- ConvertToUnsafe
+- SortBasedAggregate(key=[account#3], 
functions=[(first(product#4)(),mode=Partial,isDistinct=false),(first(probability#5)(),mode=Partial,isDistinct=false)],
 output=[account#3,first#532,valueSet#533,first#534,valueSet#535])
   +- ConvertToSafe
  +- Sort [account#3 ASC], false, 0
 +- Sort [probability#5 DESC], true, 0
+- ConvertToUnsafe
   +- Exchange rangepartitioning(probability#5 
DESC,200), None
  +- ConvertToSafe
 +- Project [_1#0 AS account#3,_2#1 AS 
product#4,_3#2 AS probability#5]
+- Scan ExistingRDD[_1#0,_2#1,_3#2]
{code}

My working hypothesis is that after {{TungstenExchange hashpartitioning}} the  
_global_ sort order on {{probability}} is lost leading to non-deterministic 
results.

If this hypothesis is valid, then how useful are aggregation functions such as 
{{first}}, {{last}} and possibly others in Spark?

It appears that the use of window functions could address the ambiguity by 
making the partitions explicit but I'd be interested in your assessment. Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail:

[jira] [Created] (SPARK-19031) JDBC Streaming Source

2016-12-29 Thread Michael Armbrust (JIRA)

Michael Armbrust created SPARK-19031:


 Summary: JDBC Streaming Source
 Key: SPARK-19031
 URL: https://issues.apache.org/jira/browse/SPARK-19031
 Project: Spark
  Issue Type: New Feature
  Components: Structured Streaming
Reporter: Michael Armbrust


Many RDBMs provide the ability to capture changes to a table (change data 
capture).  We should make this available as a streaming source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18942) Support output operations for kinesis

2016-12-29 Thread Takeshi Yamamuro (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-18942.
--
Resolution: Won't Fix

> Support output operations for kinesis
> -
>
> Key: SPARK-18942
> URL: https://issues.apache.org/jira/browse/SPARK-18942
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Affects Versions: 2.0.2
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Spark does not support output operations (e.g. DStream#saveAsTextFile) for 
> Kinesis. So, officially supporting this is useful for some AWS users, I 
> think. An usage of the output operations is assumed as follows;
> {code}
> // Import a class that includes an output function
> scala> import org.apache.spark.streaming.kinesis.KinesisDStreamFunctions._
> // Create a DStream
> scala> val stream: DStream[String] = ...
> // Define a handler to convert the DStream type for output
> scala> val msgHandler = (s: String) => s.getBytes("UTF-8")
> // Define the output operation
> scala> kinesisStream.count().saveAsKinesisStream(streamName, endpointUrl, 
> msgHandler)
> {code}
> A prototype I made is here: 
> https://github.com/apache/spark/compare/master...maropu:OutputOpForKinesis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18942) Support output operations for kinesis

2016-12-29 Thread Takeshi Yamamuro (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786515#comment-15786515
 ] 

Takeshi Yamamuro commented on SPARK-18942:
--

okay, I'll put 'won't fix' in this ticket and thanks!
Since I make some kinds of kinesis integration in my repo 
(https://github.com/maropu/spark-kinesis-sql-asl#output-operation-for-spark-streaming),
 I'll put this in SparkPackage in future.

> Support output operations for kinesis
> -
>
> Key: SPARK-18942
> URL: https://issues.apache.org/jira/browse/SPARK-18942
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Affects Versions: 2.0.2
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Spark does not support output operations (e.g. DStream#saveAsTextFile) for 
> Kinesis. So, officially supporting this is useful for some AWS users, I 
> think. An usage of the output operations is assumed as follows;
> {code}
> // Import a class that includes an output function
> scala> import org.apache.spark.streaming.kinesis.KinesisDStreamFunctions._
> // Create a DStream
> scala> val stream: DStream[String] = ...
> // Define a handler to convert the DStream type for output
> scala> val msgHandler = (s: String) => s.getBytes("UTF-8")
> // Define the output operation
> scala> kinesisStream.count().saveAsKinesisStream(streamName, endpointUrl, 
> msgHandler)
> {code}
> A prototype I made is here: 
> https://github.com/apache/spark/compare/master...maropu:OutputOpForKinesis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Devesh Parekh (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786390#comment-15786390
 ] 

Devesh Parekh commented on SPARK-18693:
---

I suggest this is more appropriately classified as a bug rather than an 
improvement. Users who follow the documentation to use CrossValidator for model 
selection with these evaluators and weighted input will get wrong results. At 
the very least, the user should be warned in the documentation that the results 
will be wrong if they fit a weight-aware model on weighted input and use these 
existing evaluators in CrossValidator. With that warning in place, making the 
evaluators work on weighted input would then be an improvement.

> BinaryClassificationEvaluator, RegressionEvaluator, and 
> MulticlassClassificationEvaluator should use sample weight data
> ---
>
> Key: SPARK-18693
> URL: https://issues.apache.org/jira/browse/SPARK-18693
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Devesh Parekh
>
> The LogisticRegression and LinearRegression models support training with a 
> weight column, but the corresponding evaluators do not support computing 
> metrics using those weights. This breaks model selection using CrossValidator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.

2016-12-29 Thread Egor Pahomov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786368#comment-15786368
 ] 

Egor Pahomov commented on SPARK-18930:
--

I'm not sure, that such restriction buried in documentation is OK. Basically 
the problem - I've created correct schema for table. I've correctly inserted 
into it. But for some reason I need to keep order of columns in select statement

> Inserting in partitioned table - partitioned field should be last in select 
> statement. 
> ---
>
> Key: SPARK-18930
> URL: https://issues.apache.org/jira/browse/SPARK-18930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, 
> emp.db/test_partitioning_3/day=69094345
> As you can imagine these numbers are num of records. But! When I do select * 
> from  temp.test_partitioning_4 data is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-29 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786319#comment-15786319
 ] 

Joseph K. Bradley commented on SPARK-18813:
---

I just added links to the categories listed above to help with maintenance.

Given feedback, I'll go ahead and update the text above to confirm that the 
proposed roadmap process will be used.  But further feedback is welcome.

Some JIRAs likely do not yet follow the process proposal (e.g., lacking 
shepherds).  I'll start trying to ping on those JIRAs which need to be updated.

> MLlib 2.2 Roadmap
> -
>
> Key: SPARK-18813
> URL: https://issues.apache.org/jira/browse/SPARK-18813
> Project: Spark
>  Issue Type: Umbrella
>  Components: ML, MLlib
>Reporter: Joseph K. Bradley
>Priority: Blocker
>  Labels: roadmap
>
> *PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
> The roadmap process described below is significantly updated since the 2.1 
> roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
> the basis for this proposal, and comment in this JIRA if you have suggestions 
> for improvements.
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.  _These 
> meanings have been updated in this proposal for the 2.2 process._
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
>

[jira] [Updated] (SPARK-18813) MLlib 2.2 Roadmap

2016-12-29 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18813:
--
Description: 
*PROPOSAL: This includes a proposal for the 2.2 roadmap process for MLlib.*
The roadmap process described below is significantly updated since the 2.1 
roadmap [SPARK-15581].  Please refer to [SPARK-15581] for more discussion on 
the basis for this proposal, and comment in this JIRA if you have suggestions 
for improvements.

h1. Roadmap process

This roadmap is a master list for MLlib improvements we are working on during 
this release.  This includes ML-related changes in PySpark and SparkR.

*What is planned for the next release?*
* This roadmap lists issues which at least one Committer has prioritized.  See 
details below in "Instructions for committers."
* This roadmap only lists larger or more critical issues.

*How can contributors influence this roadmap?*
* If you believe an issue should be in this roadmap, please discuss the issue 
on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
least one must agree to shepherd the issue.
* For general discussions, use this JIRA or the dev mailing list.  For specific 
issues, please comment on those issues or the mailing list.
* Vote for & watch issues which are important to you.
** MLlib, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
** SparkR, sorted by: [Votes | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
 or [Watchers | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]

h2. Target Version and Priority

This section describes the meaning of Target Version and Priority.  _These 
meanings have been updated in this proposal for the 2.2 process._

|| Category | Target Version | Priority | Shepherd | Put on roadmap? | In next 
release? ||
| [1 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Blocker | *must* | *must* | *must* |
| [2 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Critical | *must* | yes, unless small | *best effort* |
| [3 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Major | *must* | optional | *best effort* |
| [4 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Minor | optional | no | maybe |
| [5 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20%3D%202.2.0]
 | next release | Trivial | optional | no | maybe |
| [6 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
 | (empty) | (any) | yes | no | maybe |
| [7 | 
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(EMPTY)%20AND%20Shepherd%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
 | (empty) | (any) |

[jira] [Commented] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786276#comment-15786276
 ] 

Sean Owen commented on SPARK-19026:
---

Can you clarify? I'm not sure what you're proposing here. Maybe a PR is the 
best way to express it.

> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18930) Inserting in partitioned table - partitioned field should be last in select statement.

2016-12-29 Thread Sean Owen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786241#comment-15786241
 ] 

Sean Owen commented on SPARK-18930:
---

I don't know enough to say that myself.
[~epahomov] what's the actual problem here? you say it seems to work correctly.

> Inserting in partitioned table - partitioned field should be last in select 
> statement. 
> ---
>
> Key: SPARK-18930
> URL: https://issues.apache.org/jira/browse/SPARK-18930
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2
>Reporter: Egor Pahomov
>
> CREATE TABLE temp.test_partitioning_4 (
>   num string
>  ) 
> PARTITIONED BY (
>   day string)
>   stored as parquet
> INSERT INTO TABLE temp.test_partitioning_4 PARTITION (day)
> select day, count(*) as num from 
> hss.session where year=2016 and month=4 
> group by day
> Resulted schema on HDFS: /temp.db/test_partitioning_3/day=62456298, 
> emp.db/test_partitioning_3/day=69094345
> As you can imagine these numbers are num of records. But! When I do select * 
> from  temp.test_partitioning_4 data is correct.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19003) Add Java examples in "Spark Streaming Guide", section "Design Patterns for using foreachRDD"

2016-12-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19003:
--
Assignee: Tushar Adeshara

> Add Java examples in "Spark Streaming  Guide", section "Design Patterns for 
> using foreachRDD"
> -
>
> Key: SPARK-19003
> URL: https://issues.apache.org/jira/browse/SPARK-19003
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.2.0
>Reporter: Tushar Adeshara
>Assignee: Tushar Adeshara
>Priority: Minor
> Fix For: 2.1.1, 2.2.0
>
>
> The page http://spark.apache.org/docs/latest/streaming-programming-guide.html 
> is missing Java example in section "Design Patterns for using foreachRDD". 
> Except this section, the page has Scala, Java and Python examples for all 
> other sections, so would be good to add for consistency. 
> I have made required code changes, will raise a pull request against this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19003) Add Java examples in "Spark Streaming Guide", section "Design Patterns for using foreachRDD"

2016-12-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19003?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19003.
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.1

Issue resolved by pull request 16408
[https://github.com/apache/spark/pull/16408]

> Add Java examples in "Spark Streaming  Guide", section "Design Patterns for 
> using foreachRDD"
> -
>
> Key: SPARK-19003
> URL: https://issues.apache.org/jira/browse/SPARK-19003
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 2.2.0
>Reporter: Tushar Adeshara
>Priority: Minor
> Fix For: 2.1.1, 2.2.0
>
>
> The page http://spark.apache.org/docs/latest/streaming-programming-guide.html 
> is missing Java example in section "Design Patterns for using foreachRDD". 
> Except this section, the page has Scala, Java and Python examples for all 
> other sections, so would be good to add for consistency. 
> I have made required code changes, will raise a pull request against this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Sean Owen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-18693:
--
Issue Type: Improvement  (was: Bug)

> BinaryClassificationEvaluator, RegressionEvaluator, and 
> MulticlassClassificationEvaluator should use sample weight data
> ---
>
> Key: SPARK-18693
> URL: https://issues.apache.org/jira/browse/SPARK-18693
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Devesh Parekh
>
> The LogisticRegression and LinearRegression models support training with a 
> weight column, but the corresponding evaluators do not support computing 
> metrics using those weights. This breaks model selection using CrossValidator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18698:
--
Assignee: Ilya Matiach

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Bjoern Toldbod
>Assignee: Ilya Matiach
>Priority: Minor
> Fix For: 2.2.0
>
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-18698.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16436
[https://github.com/apache/spark/pull/16436]

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Bjoern Toldbod
>Priority: Minor
> Fix For: 2.2.0
>
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19030) Dropped event errors being reported after SparkContext has been stopped

2016-12-29 Thread michael procopio (JIRA)

michael procopio created SPARK-19030:


 Summary: Dropped event errors being reported after SparkContext 
has been stopped
 Key: SPARK-19030
 URL: https://issues.apache.org/jira/browse/SPARK-19030
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2
 Environment: Debian 8 using spark-submit with MATLAB integration spark 
code is being code using java.
Reporter: michael procopio
Priority: Minor


After stop has been called on SparkContext, errors are being reported.

6/12/29 15:54:04 ERROR scheduler.LiveListenerBus: SparkListenerBus has already 
stopped! Dropping event SparkListenerExecutorMetricsUpdate(2,WrappedArray())

The stack in the hearbeat thread is at the point where the error is thrown is:

Daemon Thread [heartbeat-receiver-event-loop-thread] (Suspended (breakpoint at 
line 124 in LiveListenerBus))
LiveListenerBus.post(SparkListenerEvent) line: 124  
DAGScheduler.executorHeartbeatReceived(String, 
Tuple4[], BlockManagerId) line: 228  
 
YarnScheduler(TaskSchedulerImpl).executorHeartbeatReceived(String, 
Tuple2>[], BlockManagerId) line: 402  

HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2$$anonfun$run$2.apply$mcV$sp()
 line: 128
Utils$.tryLogNonFatalError(Function0) line: 1290 
HeartbeatReceiver$$anonfun$receiveAndReply$1$$anon$2.run() line: 127
Executors$RunnableAdapter.call() line: 511   
ScheduledThreadPoolExecutor$ScheduledFutureTask(FutureTask).run() 
line: 266   

ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor$ScheduledFutureTask)
 line: 180
ScheduledThreadPoolExecutor$ScheduledFutureTask.run() line: 293  

ScheduledThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker)
 line: 1142 
ThreadPoolExecutor$Worker.run() line: 617   
Thread.run() line: 745  




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16402) JDBC source: Implement save API

2016-12-29 Thread Xiao Li (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786114#comment-15786114
 ] 

Xiao Li commented on SPARK-16402:
-

Yes.

> JDBC source: Implement save API
> ---
>
> Key: SPARK-16402
> URL: https://issues.apache.org/jira/browse/SPARK-16402
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are unable to call the `save` API of `DataFrameWriter` when the 
> source is JDBC. For example, 
> {noformat}
> df.write
>   .format("jdbc")
>   .option("url", url1)
>   .option("dbtable", "TEST.TRUNCATETEST")
>   .option("user", "testUser")
>   .option("password", "testPass")
>   .save() 
> {noformat}
> The error message users will get is like
> {noformat}
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> {noformat}
> However, the `save` API is very common for all the data sources, like parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Closed] (SPARK-16402) JDBC source: Implement save API

2016-12-29 Thread Xiao Li (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li closed SPARK-16402.
---
Resolution: Duplicate

> JDBC source: Implement save API
> ---
>
> Key: SPARK-16402
> URL: https://issues.apache.org/jira/browse/SPARK-16402
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are unable to call the `save` API of `DataFrameWriter` when the 
> source is JDBC. For example, 
> {noformat}
> df.write
>   .format("jdbc")
>   .option("url", url1)
>   .option("dbtable", "TEST.TRUNCATETEST")
>   .option("user", "testUser")
>   .option("password", "testPass")
>   .save() 
> {noformat}
> The error message users will get is like
> {noformat}
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> {noformat}
> However, the `save` API is very common for all the data sources, like parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19029) Remove databaseName from SimpleCatalogRelation

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786109#comment-15786109
 ] 

Apache Spark commented on SPARK-19029:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16438

> Remove databaseName  from SimpleCatalogRelation
> ---
>
> Key: SPARK-19029
> URL: https://issues.apache.org/jira/browse/SPARK-19029
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Remove useless `databaseName ` from `SimpleCatalogRelation`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19029) Remove databaseName from SimpleCatalogRelation

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19029:


Assignee: Xiao Li  (was: Apache Spark)

> Remove databaseName  from SimpleCatalogRelation
> ---
>
> Key: SPARK-19029
> URL: https://issues.apache.org/jira/browse/SPARK-19029
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Remove useless `databaseName ` from `SimpleCatalogRelation`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19029) Remove databaseName from SimpleCatalogRelation

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19029:


Assignee: Apache Spark  (was: Xiao Li)

> Remove databaseName  from SimpleCatalogRelation
> ---
>
> Key: SPARK-19029
> URL: https://issues.apache.org/jira/browse/SPARK-19029
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Remove useless `databaseName ` from `SimpleCatalogRelation`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19029) Remove databaseName from SimpleCatalogRelation

2016-12-29 Thread Xiao Li (JIRA)

Xiao Li created SPARK-19029:
---

 Summary: Remove databaseName  from SimpleCatalogRelation
 Key: SPARK-19029
 URL: https://issues.apache.org/jira/browse/SPARK-19029
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.1.0
Reporter: Xiao Li
Assignee: Xiao Li


Remove useless `databaseName ` from `SimpleCatalogRelation`. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-29 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-19012:
--
Affects Version/s: 2.1.0

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2, 2.1.0
>Reporter: Jork Zijlstra
>Assignee: Dongjoon Hyun
> Fix For: 2.2.0
>
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-29 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786052#comment-15786052
 ] 

Herman van Hovell commented on SPARK-19012:
---

Ok, you could also start a table name with {{tbl_}} and that would also make 
the problem go away.

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>Assignee: Dongjoon Hyun
> Fix For: 2.2.0
>
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-29 Thread Herman van Hovell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hovell resolved SPARK-19012.
---
   Resolution: Fixed
 Assignee: Dongjoon Hyun
Fix Version/s: 2.2.0

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>Assignee: Dongjoon Hyun
> Fix For: 2.2.0
>
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-18669) Update Apache docs regard watermarking in Structured Streaming

2016-12-29 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu resolved SPARK-18669.
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   2.1.1

> Update Apache docs regard watermarking in Structured Streaming
> --
>
> Key: SPARK-18669
> URL: https://issues.apache.org/jira/browse/SPARK-18669
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Reporter: Tathagata Das
>Assignee: Tathagata Das
> Fix For: 2.1.1, 2.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-29 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786015#comment-15786015
 ] 

Dongjoon Hyun commented on SPARK-19012:
---

Yep. I tried to update the annotation but unfortunately it was reverted that 
now. (You can see that in my PR.)

> Maybe updating the annotation of the method would also be enough. Having an 
> Exception with a clear reason would definitely already a fix.

Changing annotation on `public` API seems to be handled in a different issue 
with some more discussion because it affects many other codes (e.g. examples).

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Ilya Matiach (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786013#comment-15786013
 ] 

Ilya Matiach commented on SPARK-18693:
--

Many classifiers in ml don't seem to support weight columns yet, so probably 
other JIRAs need to be created to add weight columns to them (eg 
DecisionTreeClassifier).  Also, it doesn't look like any packages in MLLIB 
contain weight columns, so I probably should try to limit the changes to ML 
only, but it is difficult to do so since ML evaluators are just wrappers around 
MLLIB.
Also, please note the pull request that is linked to here hasn't been updated 
in a long time, and it only resolved the issue for RegressionMetrics in MLLIB:
"SPARK-11520 RegressionMetrics should support instance weights "
I'm still planning out the changes that need to be made, since this one looks 
nontrivial, any suggestions from spark folks?

> BinaryClassificationEvaluator, RegressionEvaluator, and 
> MulticlassClassificationEvaluator should use sample weight data
> ---
>
> Key: SPARK-18693
> URL: https://issues.apache.org/jira/browse/SPARK-18693
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Devesh Parekh
>
> The LogisticRegression and LinearRegression models support training with a 
> weight column, but the corresponding evaluators do not support computing 
> metrics using those weights. This breaks model selection using CrossValidator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18698:
--
 Shepherd: Joseph K. Bradley
Affects Version/s: (was: 2.0.2)
 Target Version/s: 2.2.0

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Bjoern Toldbod
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-18698:
--
Issue Type: Improvement  (was: Wish)

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Bjoern Toldbod
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2016-12-29 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15786000#comment-15786000
 ] 

Shixiong Zhu commented on SPARK-18805:
--

@etienne That should be not an infinite loop. The time is different on each 
call. Do you have the beginning of the stack track? SPARK-6847 may be related 
but you can still reproduce it in 2.0.2.

> InternalMapWithStateDStream make java.lang.StackOverflowError 
> --
>
> Key: SPARK-18805
> URL: https://issues.apache.org/jira/browse/SPARK-18805
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
> Environment: mesos
>Reporter: etienne
>
> When load InternalMapWithStateDStream from a check point.
> If isValidTime is true and if there is no generatedRDD at the given time 
> there is an infinite loop.
> 1) compute is call on InternalMapWithStateDStream
> 2) InternalMapWithStateDStream try to generate the previousRDD
> 3) Stream look in generatedRDD if the RDD is already generated for the given 
> time 
> 4) It not fund the rdd so it check if the time is valid.
> 5) if the time is valid call compute on InternalMapWithStateDStream
> 6) restart from 1)
> Here the exception that illustrate this error
> {code}
> Exception in thread "streaming-start" java.lang.StackOverflowError
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
>   at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1.apply(DStream.scala:340)
>   at 
> org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:335)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:333)
>   at scala.Option.orElse(Option.scala:289)
>   at 
> org.apache.spark.streaming.dstream.DStream.getOrCompute(DStream.scala:330)
>   at 
> org.apache.spark.streaming.dstream.InternalMapWithStateDStream.compute(MapWithStateDStream.scala:134)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1$$anonfun$1$$anonfun$apply$7.apply(DStream.scala:341)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16402) JDBC source: Implement save API

2016-12-29 Thread Nicholas Chammas (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785990#comment-15785990
 ] 

Nicholas Chammas commented on SPARK-16402:
--

[~JustinPihony], [~smilegator] - Does the resolution on SPARK-14525 also 
resolve this issue?

> JDBC source: Implement save API
> ---
>
> Key: SPARK-16402
> URL: https://issues.apache.org/jira/browse/SPARK-16402
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Xiao Li
>
> Currently, we are unable to call the `save` API of `DataFrameWriter` when the 
> source is JDBC. For example, 
> {noformat}
> df.write
>   .format("jdbc")
>   .option("url", url1)
>   .option("dbtable", "TEST.TRUNCATETEST")
>   .option("user", "testUser")
>   .option("password", "testPass")
>   .save() 
> {noformat}
> The error message users will get is like
> {noformat}
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> java.lang.RuntimeException: 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider does not 
> allow create table as select.
> {noformat}
> However, the `save` API is very common for all the data sources, like parquet.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Devesh Parekh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devesh Parekh updated SPARK-18693:
--
Description: The LogisticRegression and LinearRegression models support 
training with a weight column, but the corresponding evaluators do not support 
computing metrics using those weights. This breaks model selection using 
CrossValidator.  (was: The LogisticRegression and LinearRegression models 
support training with a weight column, but the corresponding evaluators do not 
support computing metrics using those weights.)

> BinaryClassificationEvaluator, RegressionEvaluator, and 
> MulticlassClassificationEvaluator should use sample weight data
> ---
>
> Key: SPARK-18693
> URL: https://issues.apache.org/jira/browse/SPARK-18693
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Devesh Parekh
>
> The LogisticRegression and LinearRegression models support training with a 
> weight column, but the corresponding evaluators do not support computing 
> metrics using those weights. This breaks model selection using CrossValidator.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18942) Support output operations for kinesis

2016-12-29 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785953#comment-15785953
 ] 

Shixiong Zhu commented on SPARK-18942:
--

Thanks for your prototype. Actually, you can just implement an RDD action, or 
DataFrame DataSource and put them as a Spark package like 
[spark-redshift|https://github.com/databricks/spark-redshift]. [Spark 
Packages|https://spark-packages.org/] is a better place for such third-party 
data sources.

> Support output operations for kinesis
> -
>
> Key: SPARK-18942
> URL: https://issues.apache.org/jira/browse/SPARK-18942
> Project: Spark
>  Issue Type: New Feature
>  Components: DStreams
>Affects Versions: 2.0.2
>Reporter: Takeshi Yamamuro
>Priority: Trivial
>
> Spark does not support output operations (e.g. DStream#saveAsTextFile) for 
> Kinesis. So, officially supporting this is useful for some AWS users, I 
> think. An usage of the output operations is assumed as follows;
> {code}
> // Import a class that includes an output function
> scala> import org.apache.spark.streaming.kinesis.KinesisDStreamFunctions._
> // Create a DStream
> scala> val stream: DStream[String] = ...
> // Define a handler to convert the DStream type for output
> scala> val msgHandler = (s: String) => s.getBytes("UTF-8")
> // Define the output operation
> scala> kinesisStream.count().saveAsKinesisStream(streamName, endpointUrl, 
> msgHandler)
> {code}
> A prototype I made is here: 
> https://github.com/apache/spark/compare/master...maropu:OutputOpForKinesis



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15493) Allow setting the quoteEscapingEnabled flag when writing CSV

2016-12-29 Thread Jacob Wellington (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785951#comment-15785951
 ] 

Jacob Wellington commented on SPARK-15493:
--

I'm running into an issue where this doesn't seem to be working for the SQL 
interface. I'm connecting to the thrift server using beeline and submitting the 
following sql:
{quote}
CREATE TABLE e2
USING csv 
OPTIONS (path 'test.csv', quote '"', escapeQuotes 'false', 
quoteEscapingEnabled 'false') AS
SELECT 
'"G"'
FROM parquet.`test.parquet`;
  DROP TABLE e2;
{quote}
When I look at the test.csv output I get this:
{quote}
"\"G\""
{quote}

I'm using spark 2.0.2 with its version of beeline and its hive server. I've 
also tried multiple variations of the options.


> Allow setting the quoteEscapingEnabled flag when writing CSV
> 
>
> Key: SPARK-15493
> URL: https://issues.apache.org/jira/browse/SPARK-15493
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.0.0
>Reporter: Jurriaan Pruis
>Assignee: Jurriaan Pruis
> Fix For: 2.0.0
>
>
> See 
> https://github.com/uniVocity/univocity-parsers/blob/f3eb2af26374940e60d91d1703bde54619f50c51/src/main/java/com/univocity/parsers/csv/CsvWriterSettings.java#L231-L247
> This kind of functionality is needed to be able to write RFC 4180 
> (https://tools.ietf.org/html/rfc4180#section-2) / Amazon Redshift compatible 
> CSV files 
> (https://docs.aws.amazon.com/redshift/latest/dg/copy-parameters-data-format.html#copy-csv)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18974) FileInputDStream could not detected files which moved to the directory

2016-12-29 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785930#comment-15785930
 ] 

Shixiong Zhu commented on SPARK-18974:
--

Do you want to try Structured Streaming? Its FileStreamSource allows 7 days old 
files by default.

> FileInputDStream could not detected files which moved to the directory 
> ---
>
> Key: SPARK-18974
> URL: https://issues.apache.org/jira/browse/SPARK-18974
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 1.6.3, 2.0.2
>Reporter: Adam Wang
>
> FileInputDStream use mod time to find new files, but if a file was moved into 
> the directories it's modification time would not be changed, so 
> FileInputDStream could not detect these files.
> I think a way to fix this bug is get access_time and do judgment, bug it need 
> a Set of files to save all old files, it would very inefficient for lot of 
> files directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18359) Let user specify locale in CSV parsing

2016-12-29 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-18359:
-
Component/s: (was: Spark Core)

> Let user specify locale in CSV parsing
> --
>
> Key: SPARK-18359
> URL: https://issues.apache.org/jira/browse/SPARK-18359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: yannick Radji
>
> On the DataFrameReader object there no CSV-specific option to set decimal 
> delimiter on comma whereas dot like it use to be in France and Europe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18404) RPC call from executor to driver blocks when getting map output locations (Netty Only)

2016-12-29 Thread Shixiong Zhu (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785914#comment-15785914
 ] 

Shixiong Zhu commented on SPARK-18404:
--

That's pretty weird. It's a blocking call for both netty and akka rpc.

> RPC call from executor to driver blocks when getting map output locations 
> (Netty Only)
> --
>
> Key: SPARK-18404
> URL: https://issues.apache.org/jira/browse/SPARK-18404
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.6.0
>Reporter: Jeffrey Shmain
>
> Compared identical application run on Spark 1.5 and Spark 1.6.  Noticed that 
> jobs became slower. After looking at it closer, found that 75% of tasks 
> finished same or above, and 25% had significant delays (unrelated to data 
> skew and GC)
> After more debugging noticed that the executors are blocking for few seconds 
> (sometimes 25) on this call:
> https://github.com/apache/spark/blob/39e2bad6a866d27c3ca594d15e574a1da3ee84cc/core/src/main/scala/org/apache/spark/MapOutputTracker.scala#L199
>logInfo("Doing the fetch; tracker endpoint = " + trackerEndpoint)
> // This try-finally prevents hangs due to timeouts:
> try {
>   val fetchedBytes = 
> askTracker[Array[Byte]](GetMapOutputStatuses(shuffleId))
>   fetchedStatuses = 
> MapOutputTracker.deserializeMapStatuses(fetchedBytes)
>   logInfo("Got the output locations")
> So the regression seems to be related changing the default from from Akka to 
> Netty.  
> This was an application working with RDDs, submitting 10 concurrent queries 
> at a time.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19012) CreateOrReplaceTempView throws org.apache.spark.sql.catalyst.parser.ParseException when viewName first char is numerical

2016-12-29 Thread Shixiong Zhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shixiong Zhu updated SPARK-19012:
-
Component/s: (was: Spark Core)
 SQL

> CreateOrReplaceTempView throws 
> org.apache.spark.sql.catalyst.parser.ParseException when viewName first char 
> is numerical
> 
>
> Key: SPARK-19012
> URL: https://issues.apache.org/jira/browse/SPARK-19012
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.1, 2.0.2
>Reporter: Jork Zijlstra
>
> Using a viewName where the the fist char is a numerical value on 
> dataframe.createOrReplaceTempView(viewName: String) causes:
> {code}
> Exception in thread "main" 
> org.apache.spark.sql.catalyst.parser.ParseException: 
> mismatched input '1468079114' expecting {'SELECT', 'FROM', 'ADD', 'AS', 
> 'ALL', 'DISTINCT', 'WHERE', 'GROUP', 'BY', 'GROUPING', 'SETS', 'CUBE', 
> 'ROLLUP', 'ORDER', 'HAVING', 'LIMIT', 'AT', 'OR', 'AND', 'IN', NOT, 'NO', 
> 'EXISTS', 'BETWEEN', 'LIKE', RLIKE, 'IS', 'NULL', 'TRUE', 'FALSE', 'NULLS', 
> 'ASC', 'DESC', 'FOR', 'INTERVAL', 'CASE', 'WHEN', 'THEN', 'ELSE', 'END', 
> 'JOIN', 'CROSS', 'OUTER', 'INNER', 'LEFT', 'SEMI', 'RIGHT', 'FULL', 
> 'NATURAL', 'ON', 'LATERAL', 'WINDOW', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 
> 'UNBOUNDED', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'WITH', 'VALUES', 
> 'CREATE', 'TABLE', 'VIEW', 'REPLACE', 'INSERT', 'DELETE', 'INTO', 'DESCRIBE', 
> 'EXPLAIN', 'FORMAT', 'LOGICAL', 'CODEGEN', 'CAST', 'SHOW', 'TABLES', 
> 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'DROP', 'UNION', 
> 'EXCEPT', 'INTERSECT', 'TO', 'TABLESAMPLE', 'STRATIFY', 'ALTER', 'RENAME', 
> 'ARRAY', 'MAP', 'STRUCT', 'COMMENT', 'SET', 'RESET', 'DATA', 'START', 
> 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'MACRO', 'IF', 'DIV', 'PERCENT', 
> 'BUCKET', 'OUT', 'OF', 'SORT', 'CLUSTER', 'DISTRIBUTE', 'OVERWRITE', 
> 'TRANSFORM', 'REDUCE', 'USING', 'SERDE', 'SERDEPROPERTIES', 'RECORDREADER', 
> 'RECORDWRITER', 'DELIMITED', 'FIELDS', 'TERMINATED', 'COLLECTION', 'ITEMS', 
> 'KEYS', 'ESCAPED', 'LINES', 'SEPARATED', 'FUNCTION', 'EXTENDED', 'REFRESH', 
> 'CLEAR', 'CACHE', 'UNCACHE', 'LAZY', 'FORMATTED', TEMPORARY, 'OPTIONS', 
> 'UNSET', 'TBLPROPERTIES', 'DBPROPERTIES', 'BUCKETS', 'SKEWED', 'STORED', 
> 'DIRECTORIES', 'LOCATION', 'EXCHANGE', 'ARCHIVE', 'UNARCHIVE', 'FILEFORMAT', 
> 'TOUCH', 'COMPACT', 'CONCATENATE', 'CHANGE', 'CASCADE', 'RESTRICT', 
> 'CLUSTERED', 'SORTED', 'PURGE', 'INPUTFORMAT', 'OUTPUTFORMAT', DATABASE, 
> DATABASES, 'DFS', 'TRUNCATE', 'ANALYZE', 'COMPUTE', 'LIST', 'STATISTICS', 
> 'PARTITIONED', 'EXTERNAL', 'DEFINED', 'REVOKE', 'GRANT', 'LOCK', 'UNLOCK', 
> 'MSCK', 'REPAIR', 'RECOVER', 'EXPORT', 'IMPORT', 'LOAD', 'ROLE', 'ROLES', 
> 'COMPACTIONS', 'PRINCIPALS', 'TRANSACTIONS', 'INDEX', 'INDEXES', 'LOCKS', 
> 'OPTION', 'ANTI', 'LOCAL', 'INPATH', 'CURRENT_DATE', 'CURRENT_TIMESTAMP', 
> IDENTIFIER, BACKQUOTED_IDENTIFIER}(line 1, pos 0)
> == SQL ==
> 1
> {code}
> {code}
> val tableOrViewName = "1" //fails
> val tableOrViewName = "a" //works
> sparkSession.read.orc(path).createOrReplaceTempView(tableOrViewName)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19028) Fixed non-thread-safe functions used in SessionCatalog

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19028:


Assignee: Xiao Li  (was: Apache Spark)

> Fixed non-thread-safe functions used in SessionCatalog
> --
>
> Key: SPARK-19028
> URL: https://issues.apache.org/jira/browse/SPARK-19028
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Fixed non-thread-safe functions used in SessionCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19028) Fixed non-thread-safe functions used in SessionCatalog

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19028:


Assignee: Apache Spark  (was: Xiao Li)

> Fixed non-thread-safe functions used in SessionCatalog
> --
>
> Key: SPARK-19028
> URL: https://issues.apache.org/jira/browse/SPARK-19028
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Xiao Li
>Assignee: Apache Spark
>
> Fixed non-thread-safe functions used in SessionCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19028) Fixed non-thread-safe functions used in SessionCatalog

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785827#comment-15785827
 ] 

Apache Spark commented on SPARK-19028:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16437

> Fixed non-thread-safe functions used in SessionCatalog
> --
>
> Key: SPARK-19028
> URL: https://issues.apache.org/jira/browse/SPARK-19028
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.2, 2.1.0
>Reporter: Xiao Li
>Assignee: Xiao Li
>
> Fixed non-thread-safe functions used in SessionCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19028) Fixed non-thread-safe functions used in SessionCatalog

2016-12-29 Thread Xiao Li (JIRA)

Xiao Li created SPARK-19028:
---

 Summary: Fixed non-thread-safe functions used in SessionCatalog
 Key: SPARK-19028
 URL: https://issues.apache.org/jira/browse/SPARK-19028
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.1.0, 2.0.2
Reporter: Xiao Li
Assignee: Xiao Li


Fixed non-thread-safe functions used in SessionCatalog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18737) Serialization setting "spark.serializer" ignored in Spark 2.x

2016-12-29 Thread Josh Bacon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785715#comment-15785715
 ] 

Josh Bacon commented on SPARK-18737:


I think this issue may be related to the following issues:
https://issues.apache.org/jira/browse/SPARK-18560
https://issues.apache.org/jira/browse/SPARK-18617




> Serialization setting "spark.serializer" ignored in Spark 2.x
> -
>
> Key: SPARK-18737
> URL: https://issues.apache.org/jira/browse/SPARK-18737
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.0, 2.0.1
>Reporter: Dr. Michael Menzel
>
> The following exception occurs although the JavaSerializer has been activated:
> 16/11/22 10:49:24 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 
> 77, ip-10-121-14-147.eu-central-1.compute.internal, partition 1, RACK_LOCAL, 
> 5621 bytes)
> 16/11/22 10:49:24 INFO YarnSchedulerBackend$YarnDriverEndpoint: Launching 
> task 77 on executor id: 2 hostname: 
> ip-10-121-14-147.eu-central-1.compute.internal.
> 16/11/22 10:49:24 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory 
> on ip-10-121-14-147.eu-central-1.compute.internal:45059 (size: 879.0 B, free: 
> 410.4 MB)
> 16/11/22 10:49:24 WARN TaskSetManager: Lost task 0.0 in stage 9.0 (TID 77, 
> ip-10-121-14-147.eu-central-1.compute.internal): 
> com.esotericsoftware.kryo.KryoException: Encountered unregistered class ID: 
> 13994
> at 
> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:137)
> at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:670)
> at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:781)
> at 
> org.apache.spark.serializer.KryoDeserializationStream.readObject(KryoSerializer.scala:229)
> at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:169)
> at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at org.apache.spark.util.NextIterator.foreach(NextIterator.scala:21)
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
> at org.apache.spark.util.NextIterator.to(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
> at org.apache.spark.util.NextIterator.toBuffer(NextIterator.scala:21)
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
> at org.apache.spark.util.NextIterator.toArray(NextIterator.scala:21)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.rdd.RDD$$anonfun$toLocalIterator$1$$anonfun$org$apache$spark$rdd$RDD$$anonfun$$collectPartition$1$1.apply(RDD.scala:927)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1916)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
> at org.apache.spark.scheduler.Task.run(Task.scala:86)
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> The code runs perfectly with Spark 1.6.0. Since we moved to 2.0.0 and now 
> 2.0.1, we see the Kyro deserialization exception and over time the Spark 
> streaming job stops processing since too many tasks failed.
> Our action was to use conf.set("spark.serializer", 
> "org.apache.spark.serializer.JavaSerializer") and to disable Kryo class 
> registration with conf.set("spark.kryo.registrationRequired", false). We hope 
> to identify the root cause of the exception. 
> However, setting the serializer to JavaSerializer is oviously ignored by the 
> Spark-internals. Despite the setting we still see the exception printed in 
> the log and tasks fail. The occurence seems to be non-deterministic, but to 
> become more frequent over time.
> Several questions we could not answer during our troubleshooting:
> 1. How can the debug log for Kryo be enabled? -- We tried following the 
> minilog documentation, but no output can be found.
> 2. Is the serializer setting effective

[jira] [Commented] (SPARK-18883) FileNotFoundException on _temporary directory

2016-12-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785634#comment-15785634
 ] 

Steve Loughran commented on SPARK-18883:


thanks, good to know

> FileNotFoundException on _temporary directory 
> --
>
> Key: SPARK-18883
> URL: https://issues.apache.org/jira/browse/SPARK-18883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2
> Environment: We're on a CDH 5.7, Hadoop 2.6.
>Reporter: Mathieu D
>
> I'm experiencing the following exception, usually after some time with heavy 
> load :
> {code}
> 16/12/15 11:25:18 ERROR InsertIntoHadoopFsRelationCommand: Aborting job.
> java.io.FileNotFoundException: File 
> hdfs://nameservice1/user/xdstore/rfs/rfsDB/_temporary/0 does not exist.
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:795)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.access$700(DistributedFileSystem.java:106)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:853)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:849)
> at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:860)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1517)
> at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1557)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getAllCommittedTaskPaths(FileOutputCommitter.java:291)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJobInternal(FileOutputCommitter.java:361)
> at 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:334)
> at 
> org.apache.parquet.hadoop.ParquetOutputCommitter.commitJob(ParquetOutputCommitter.java:46)
> at 
> org.apache.spark.sql.execution.datasources.BaseWriterContainer.commitJob(WriterContainer.scala:222)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply$mcV$sp(InsertIntoHadoopFsRelationCommand.scala:144)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand$$anonfun$run$1.apply(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
> at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:115)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
> at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:115)
> at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:136)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:133)
> at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:114)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:86)
> at 
> org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:525)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:211)
> at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:194)
> at 
> org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:488)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.append(RfsRepository.scala:36)
> at 
> com.bluedme.woda.ng.indexer.RfsRepository.insert(RfsRepository.scala:23)
> at 
> com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:33)
> at 
> com.bluedme.woda.cmd.ShareDatasetImpl.runImmediate(ShareDatasetImpl.scala:13)
> at 
>

[jira] [Assigned] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18698:


Assignee: Apache Spark

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Wish
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Bjoern Toldbod
>Assignee: Apache Spark
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16494) Upgrade breeze version to 0.12

2016-12-29 Thread koert kuipers (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-16494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785629#comment-15785629
 ] 

koert kuipers commented on SPARK-16494:
---

i just ran into an issue because of this when trying to upgrade to spark 2.1.0
breeze 0.12 introduces a dependency on shapeless 2.0.0, which is old (april 
2014) and not compatible with the version(s) we are using

> Upgrade breeze version to 0.12
> --
>
> Key: SPARK-16494
> URL: https://issues.apache.org/jira/browse/SPARK-16494
> Project: Spark
>  Issue Type: Improvement
>  Components: ML, MLlib
>Reporter: Yanbo Liang
>Assignee: Yanbo Liang
>Priority: Minor
> Fix For: 2.1.0
>
>
> breeze 0.12 has been released for more than half a year, and it brings lots 
> of new features, performance improvement and bug fixes.
> One of the biggest features is LBFGS-B which is an implementation of LBFGS 
> with box constraints and much faster for some special case.
> We would like to implement Huber loss function for {{LinearRegression}} 
> (SPARK-3181) and it requires LBFGS-B as the optimization solver. So we should 
> bump up the dependent breeze version to 0.12.
> For more features, improvements and bug fixes of breeze 0.12, you can refer 
> the following link:
> https://groups.google.com/forum/#!topic/scala-breeze/nEeRi_DcY5c



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-18698:


Assignee: (was: Apache Spark)

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Wish
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Bjoern Toldbod
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785627#comment-15785627
 ] 

Apache Spark commented on SPARK-18698:
--

User 'imatiach-msft' has created a pull request for this issue:
https://github.com/apache/spark/pull/16436

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Wish
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Bjoern Toldbod
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19027) estimate size of object buffer for object hash aggregate

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19027:


Assignee: Wenchen Fan  (was: Apache Spark)

> estimate size of object buffer for object hash aggregate
> 
>
> Key: SPARK-19027
> URL: https://issues.apache.org/jira/browse/SPARK-19027
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19027) estimate size of object buffer for object hash aggregate

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785604#comment-15785604
 ] 

Apache Spark commented on SPARK-19027:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/16435

> estimate size of object buffer for object hash aggregate
> 
>
> Key: SPARK-19027
> URL: https://issues.apache.org/jira/browse/SPARK-19027
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19027) estimate size of object buffer for object hash aggregate

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19027:


Assignee: Apache Spark  (was: Wenchen Fan)

> estimate size of object buffer for object hash aggregate
> 
>
> Key: SPARK-19027
> URL: https://issues.apache.org/jira/browse/SPARK-19027
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19027) estimate size of object buffer for object hash aggregate

2016-12-29 Thread Wenchen Fan (JIRA)

Wenchen Fan created SPARK-19027:
---

 Summary: estimate size of object buffer for object hash aggregate
 Key: SPARK-19027
 URL: https://issues.apache.org/jira/browse/SPARK-19027
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17346) Kafka 0.10 support in Structured Streaming

2016-12-29 Thread koert kuipers (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785560#comment-15785560
 ] 

koert kuipers commented on SPARK-17346:
---

this ticket mentions kafka 0.10-based sinks for structured streaming, but i 
think only sources are implemented. is there another ticket for sinks? thanks

> Kafka 0.10 support in Structured Streaming
> --
>
> Key: SPARK-17346
> URL: https://issues.apache.org/jira/browse/SPARK-17346
> Project: Spark
>  Issue Type: Sub-task
>  Components: Structured Streaming
>Reporter: Frederick Reiss
>Assignee: Shixiong Zhu
> Fix For: 2.0.2, 2.1.0
>
>
> Implement Kafka 0.10-based sources and sinks for Structured Streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18698) public constructor with uid for IndexToString-class

2016-12-29 Thread Ilya Matiach (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785558#comment-15785558
 ] 

Ilya Matiach commented on SPARK-18698:
--

This looks like a minor bug... similar transformers have such a constructor.  I 
can send a pull request for this change.

> public constructor with uid for IndexToString-class
> ---
>
> Key: SPARK-18698
> URL: https://issues.apache.org/jira/browse/SPARK-18698
> Project: Spark
>  Issue Type: Wish
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Bjoern Toldbod
>Priority: Minor
>
> The IndexToString class in org.apache.spark.ml.feature does not provide a 
> public constructor which takes a uid string.
> It would be nice to have such a constructor.
> (Generally, being able to name pipelinestages makes it much easier to work 
> with complex models)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-17645) Add feature selector methods based on: False Discovery Rate (FDR) and Family Wise Error rate (FWE)

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-17645?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785524#comment-15785524
 ] 

Apache Spark commented on SPARK-17645:
--

User 'mpjlu' has created a pull request for this issue:
https://github.com/apache/spark/pull/16434

> Add feature selector methods based on: False Discovery Rate (FDR) and Family 
> Wise Error rate (FWE)
> --
>
> Key: SPARK-17645
> URL: https://issues.apache.org/jira/browse/SPARK-17645
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, MLlib
>Reporter: Peng Meng
>Assignee: Peng Meng
>Priority: Minor
> Fix For: 2.2.0
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Univariate feature selection works by selecting the best features based on 
> univariate statistical tests. 
> FDR and FWE are a popular univariate statistical test for feature selection.
> In 2005, the Benjamini and Hochberg paper on FDR was identified as one of the 
> 25 most-cited statistical papers. The FDR uses the Benjamini-Hochberg 
> procedure in this PR. https://en.wikipedia.org/wiki/False_discovery_rate. 
> In statistics, FWE is the probability of making one or more false 
> discoveries, or type I errors, among all the hypotheses when performing 
> multiple hypotheses tests.
> https://en.wikipedia.org/wiki/Family-wise_error_rate
> We add FDR and FWE methods for ChiSqSelector in this PR, like it is 
> implemented in scikit-learn. 
> http://scikit-learn.org/stable/modules/feature_selection.html#univariate-feature-selection



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785496#comment-15785496
 ] 

Nattavut Sutyanyong commented on SPARK-19017:
-

In 3-value logic, true OR unknown = true. Using your formula above, we will 
have (2,1) NOT IN (1,null) evaluated as (2 <> 1) OR (1 <> null) which is true.

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Issue Comment Deleted] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-29 Thread Nattavut Sutyanyong (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nattavut Sutyanyong updated SPARK-19017:

Comment: was deleted

(was: In 3-value logic, true OR unknown = true. Using your formula above, we 
will have (2,1) NOT IN (1,null) evaluated as (2 <> 1) OR (1 <> null) which is 
true.)

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785494#comment-15785494
 ] 

Nattavut Sutyanyong commented on SPARK-19017:
-

In 3-value logic, true OR unknown = true. Using your formula above, we will 
have (2,1) NOT IN (1,null) evaluated as (2 <> 1) OR (1 <> null) which is true.

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-18693) BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator should use sample weight data

2016-12-29 Thread Ilya Matiach (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785469#comment-15785469
 ] 

Ilya Matiach commented on SPARK-18693:
--

I can take a look into fixing this issue.

> BinaryClassificationEvaluator, RegressionEvaluator, and 
> MulticlassClassificationEvaluator should use sample weight data
> ---
>
> Key: SPARK-18693
> URL: https://issues.apache.org/jira/browse/SPARK-18693
> Project: Spark
>  Issue Type: Bug
>  Components: ML
>Affects Versions: 2.0.2
>Reporter: Devesh Parekh
>
> The LogisticRegression and LinearRegression models support training with a 
> weight column, but the corresponding evaluators do not support computing 
> metrics using those weights.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-29 Thread Herman van Hovell (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785436#comment-15785436
 ] 

Herman van Hovell commented on SPARK-19017:
---

Ok, that is fair. Let me correct my mistake.

{{NOT IN}} can be rewritten into, in to a sequence of NOT equals statements. 
Each statement contains one tuple of the subquery relation. So we would get 
something like:
{noformat}
WHERE (NOT (a1 = a2(1) AND  b1 = b2(1))) AND (NOT (a1 = a2(2) AND  b1 = b2(2))) 
AND ... AND (NOT (a1 = a2(n) AND  b1 = b2(n)))
{noformat}
Which can be rewritten into:
{noformat}
WHERE  (a1 <> a2(1) OR  b1 <> b2(1)) AND (a1 <> a2(2) OR  b1 <> b2(2)) AND ... 
AND (a1 <> a2(n) OR  b1 <> b2(n))
{noformat}
This would evaluate to null if one of the tuples in the subquery relation 
contains a null.

> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19017) NOT IN subquery with more than one column may return incorrect results

2016-12-29 Thread Nattavut Sutyanyong (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785417#comment-15785417
 ] 

Nattavut Sutyanyong commented on SPARK-19017:
-

Using your interpretation, (2,1) not in (2,0) would be evaluated to false. 
Spark returns (2,1). So do many other SQL engines.


> NOT IN subquery with more than one column may return incorrect results
> --
>
> Key: SPARK-19017
> URL: https://issues.apache.org/jira/browse/SPARK-19017
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Nattavut Sutyanyong
>
> When putting more than one column in the NOT IN, the query may not return 
> correctly if there is a null data. We can demonstrate the problem with the 
> following data set and query:
> {code}
> Seq((2,1)).toDF("a1","b1").createOrReplaceTempView("t1")
> Seq[(java.lang.Integer,java.lang.Integer)]((1,null)).toDF("a2","b2").createOrReplaceTempView("t2")
> sql("select * from t1 where (a1,b1) not in (select a2,b2 from t2)").show
> +---+---+
> | a1| b1|
> +---+---+
> +---+---+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread zuotingbing (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15785220#comment-15785220
 ] 

zuotingbing commented on SPARK-19026:
-

i will commit the code after this issues be accepted.

> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread zuotingbing (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zuotingbing updated SPARK-19026:

Description: 
i set SPARK_LOCAL_DIRS variable like this:
SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp

when there is no more free disk space on "/data4/spark/tmp" ,  other local 
directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
application finished.

 we should catch the IOExecption when create local dirs throws execption , 
otherwise  the variable "appDirectories(appId)" not be set , then local 
directories "executor-***" cannot be deleted for this application.  If the 
number of folders "executor-***" > 32k we cannot create executor anymore on 
this worker node.


  was:
i set SPARK_LOCAL_DIRS variable like this:
SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp

when there is no more free disk space on "/data4/spark/tmp" ,  other local 
directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
application finished.

 we should catch the IOExecption when create local dirs throws execption , 
otherwise  the variable "appDirectories(appId)" not be set , then local 
directories "executor-***" cannot be deleted for this application.  If the 
number of folders "executor-***" > 32k we cannot created executor anymore on 
this worker node.



> local directories cannot be cleanuped when create directory of "executor-***" 
> throws IOException such as there is no more free disk space to create it etc.
> ---
>
> Key: SPARK-19026
> URL: https://issues.apache.org/jira/browse/SPARK-19026
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.5.2, 2.0.2
> Environment: linux 
>Reporter: zuotingbing
>
> i set SPARK_LOCAL_DIRS variable like this:
> SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp
> when there is no more free disk space on "/data4/spark/tmp" ,  other local 
> directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
> application finished.
>  we should catch the IOExecption when create local dirs throws execption , 
> otherwise  the variable "appDirectories(appId)" not be set , then local 
> directories "executor-***" cannot be deleted for this application.  If the 
> number of folders "executor-***" > 32k we cannot create executor anymore on 
> this worker node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19026) local directories cannot be cleanuped when create directory of "executor-***" throws IOException such as there is no more free disk space to create it etc.

2016-12-29 Thread zuotingbing (JIRA)

zuotingbing created SPARK-19026:
---

 Summary: local directories cannot be cleanuped when create 
directory of "executor-***" throws IOException such as there is no more free 
disk space to create it etc.
 Key: SPARK-19026
 URL: https://issues.apache.org/jira/browse/SPARK-19026
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.0.2, 1.5.2
 Environment: linux 
Reporter: zuotingbing


i set SPARK_LOCAL_DIRS variable like this:
SPARK_LOCAL_DIRS=/data2/spark/tmp,/data3/spark/tmp,/data4/spark/tmp

when there is no more free disk space on "/data4/spark/tmp" ,  other local 
directories (/data2/spark/tmp,/data3/spark/tmp) cannot be cleanuped when my 
application finished.

 we should catch the IOExecption when create local dirs throws execption , 
otherwise  the variable "appDirectories(appId)" not be set , then local 
directories "executor-***" cannot be deleted for this application.  If the 
number of folders "executor-***" > 32k we cannot created executor anymore on 
this worker node.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19025) Remove SQL builder for operators

2016-12-29 Thread Jiang Xingbo (JIRA)

Jiang Xingbo created SPARK-19025:


 Summary: Remove SQL builder for operators
 Key: SPARK-19025
 URL: https://issues.apache.org/jira/browse/SPARK-19025
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Jiang Xingbo


With the new approach of view resolution, we can get rid of SQL generation on 
view creation, so let's remove SQL builder for operators.

Note that, since all sql generation for operators is defined in one file 
(org.apache.spark.sql.catalyst.SQLBuilder), it’d be trivial to recover it in 
the future.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19024) Don't generate SQL query on CREATE/ALTER a view

2016-12-29 Thread Jiang Xingbo (JIRA)

Jiang Xingbo created SPARK-19024:


 Summary: Don't generate SQL query on CREATE/ALTER a view
 Key: SPARK-19024
 URL: https://issues.apache.org/jira/browse/SPARK-19024
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Jiang Xingbo


On CREATE/ALTER a view, it's no longer needed to generate a SQL text string 
from the LogicalPlan, instead we store the SQL query text、the output schema of 
the LogicalPlan, and current database to CatalogTable. The new view resolution 
approach will be able to resolve the view.
The main advantage includes:
1. If you update an underlying view, the current view also gets updated;
2. That gives us a change to get ride of SQL generation for operators.

Should bring in the following changes:
1. Add new params to `CatalogTable`, that represents the SQL query 
text、 the output schema of the LogicalPlan, and current database, on the time 
when the view is created;
2. Update the commands `CreateViewCommand` and `AlterViewAsCommand`, 
get rid of SQL generation in them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19023) Memory leak on GraphX with an iterative algorithm and checkpoint on the graph

2016-12-29 Thread Julien MASSIOT (JIRA)

Julien MASSIOT created SPARK-19023:
--

 Summary: Memory leak on GraphX with an iterative algorithm and 
checkpoint on the graph
 Key: SPARK-19023
 URL: https://issues.apache.org/jira/browse/SPARK-19023
 Project: Spark
  Issue Type: Bug
  Components: GraphX
Affects Versions: 2.0.2
Reporter: Julien MASSIOT


I am facing OOM whithin a spark streaming application with GraphX.  
While trying to reproduce the issue on a simple application, I was able to 
identify what appears to be 2 kind of memory leaks.  
  
*Leak 1*

It can be reproduced with this simple scala application (that simulates more or 
less what I'm doing in my spark streaming application, each iteration within 
the loop simulating one micro-batch).

{code:title=TestGraph.scala|borderStyle=solid}

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx._


object TestGraph {
case class ImpactingEvent(entityInstance: String)
case class ImpactedNode(entityIsntance:String)
case class RelationInstance(relationType : String)
var impactingGraph : Graph[ImpactedNode, RelationInstance] = null;

def main(args: Array[String]) {
  val conf = new 
SparkConf().setAppName("TestImpactingPropagation").setMaster("local")
  conf.set("spark.checkpoint.checkpointAllMarkedAncestors", "True")
  val sc = new SparkContext(conf)
  sc.setLogLevel("ERROR")
 
  val vertices: RDD[(VertexId, ImpactedNode)] = sc.parallelize(Array( (1L, 
ImpactedNode("Node1")), (2L, ImpactedNode("Node2")), (3L, 
ImpactedNode("Node3"
  
  val edges: RDD[Edge[RelationInstance]] =  sc.parallelize(Array( Edge(1L, 
2L, RelationInstance("Required")), Edge(1L, 2L, RelationInstance("Failover"

  impactingGraph = Graph(vertices, edges, null)
  
  var x =0;
  for(x <- 1 to 10){
impactingGraph = propagateEvent(impactingGraph, 
ImpactingEvent("node1"), sc)

impactingGraph.checkpoint()
impactingGraph.edges.count()
impactingGraph.vertices.count()
  }
  println("Hello")
  Thread.sleep(1000)
}

private def propagateEvent(impactingGraph: Graph[ImpactedNode, 
RelationInstance], event: ImpactingEvent, sc:SparkContext): Graph[ImpactedNode, 
RelationInstance] = {
  var graph = impactingGraph.mapVertices((id, node) => node ).cache
  impactingGraph.unpersist(true)
  graph.cache();
}
}

{code}
  
In this simple application, I am just applying a mapVertices transformation on 
the graph and then I am doing a checkpoint on the graph. I am doing this 
operation 10 times.   
After this application finished the loop, I am taking an heapdump.  
  
In this heapdump, I am able to see 11 "live" GraphImpl instances in memory.  
My expectation is to have only 1 (the one referenced in the global variable 
impactingGraph).  
  
The "leak" is coming from the f function in a MapPartitionsRDD (which is 
referenced by the partitionsRDD variable of my VertexRDD).
This f function contains an outer reference to the graph created in the 
previous iteration.

I can see that in the clearDependencies function of MapPartitionsRDD, the f 
function is not reset to null.
  
When looking to similar issues, I found this pull request:  
[https://github.com/apache/spark/pull/3545]


In this pull request, the f variable is reset to null in the clearDependencies 
method of the ZippedPartitionsRDD.
I do not understand why the same is not done within the MapPartitionsRDD.  
I made a try by patching spark-core and by setting f to null in 
clearDependencies of MapPartitionsRDD and it solved my leak on this simple use 
case.

Don't you think the f variable has to be reset to null also in MapPartitionsRDD 
?


*Leak 2*


Now, I'll do the same but in the propageEvent method in addition to the 
mapVertices I am doing a joinVertices on the graph.
It can be found in the following application:

{code:title=TestGraph.scala|borderStyle=solid}

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.graphx.Graph
import org.apache.spark.rdd.RDD
import org.apache.spark.graphx._


object TestGraph {
case class ImpactingEvent(entityInstance: String)
case class ImpactedNode(entityIsntance:String)
case class RelationInstance(relationType : String)
var impactingGraph : Graph[ImpactedNode, RelationInstance] = null;

def main(args: Array[String]) {
  val conf = new 
SparkConf().setAppName("TestImpactingPropagation").setMaster("local")
  conf.set("spark.checkpoint.checkpointAllMarkedAncestors", "True")
  val sc = new SparkContext(conf)
  sc.setLogLevel("ERROR")
 
  val vertices: RDD[(VertexId, ImpactedNode)] = sc.parallelize(Array( (1L, 
ImpactedNode("Node1")), (2L,

[jira] [Assigned] (SPARK-19022) Fix tests dependent on OS due to different newline characters

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19022:


Assignee: Apache Spark

> Fix tests dependent on OS due to different newline characters
> -
>
> Key: SPARK-19022
> URL: https://issues.apache.org/jira/browse/SPARK-19022
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Minor
>
> There are two tests failing on Windows due to the different newlines.
> {code}
>  - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" did not equal "{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" (StreamingQueryStatusAndProgressSuite.scala:36)
> {code}
> {code}
>  - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" did not equal "{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" (StreamingQueryStatusAndProgressSuite.scala:115)
>org.scalatest.exceptions.TestFailedException:
> {code}
> The reason is, {{pretty}} in {{org.json4s.pretty}} writes OS-dependent 
> newlines but the string defined in the tests are {{\n}}. This ends up with 
> test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-19022) Fix tests dependent on OS due to different newline characters

2016-12-29 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19022:


Assignee: (was: Apache Spark)

> Fix tests dependent on OS due to different newline characters
> -
>
> Key: SPARK-19022
> URL: https://issues.apache.org/jira/browse/SPARK-19022
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> There are two tests failing on Windows due to the different newlines.
> {code}
>  - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" did not equal "{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" (StreamingQueryStatusAndProgressSuite.scala:36)
> {code}
> {code}
>  - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" did not equal "{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" (StreamingQueryStatusAndProgressSuite.scala:115)
>org.scalatest.exceptions.TestFailedException:
> {code}
> The reason is, {{pretty}} in {{org.json4s.pretty}} writes OS-dependent 
> newlines but the string defined in the tests are {{\n}}. This ends up with 
> test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19022) Fix tests dependent on OS due to different newline characters

2016-12-29 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784908#comment-15784908
 ] 

Apache Spark commented on SPARK-19022:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/16433

> Fix tests dependent on OS due to different newline characters
> -
>
> Key: SPARK-19022
> URL: https://issues.apache.org/jira/browse/SPARK-19022
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> There are two tests failing on Windows due to the different newlines.
> {code}
>  - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" did not equal "{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" (StreamingQueryStatusAndProgressSuite.scala:36)
> {code}
> {code}
>  - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" did not equal "{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" (StreamingQueryStatusAndProgressSuite.scala:115)
>org.scalatest.exceptions.TestFailedException:
> {code}
> The reason is, {{pretty}} in {{org.json4s.pretty}} writes OS-dependent 
> newlines but the string defined in the tests are {{\n}}. This ends up with 
> test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-19022) Fix tests dependent on OS due to different newline characters

2016-12-29 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-19022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15784894#comment-15784894
 ] 

Hyukjin Kwon commented on SPARK-19022:
--

It seems these are (almost) all instances across the tests on Windows. I will 
double check in the PR again.

> Fix tests dependent on OS due to different newline characters
> -
>
> Key: SPARK-19022
> URL: https://issues.apache.org/jira/browse/SPARK-19022
> Project: Spark
>  Issue Type: Test
>  Components: Structured Streaming, Tests
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> There are two tests failing on Windows due to the different newlines.
> {code}
>  - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" did not equal "{
>  "id" : "39788670-6722-48b7-a248-df6ba08722ac",
>  "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
>  "name" : "myName",
>  "timestamp" : "2016-12-05T20:54:20.827Z",
>  "numInputRows" : 678,
>  "inputRowsPerSecond" : 10.0,
>  "durationMs" : {
>"total" : 0
>  },
>  "eventTime" : {
>"avg" : "2016-12-05T20:54:20.827Z",
>"max" : "2016-12-05T20:54:20.827Z",
>"min" : "2016-12-05T20:54:20.827Z",
>"watermark" : "2016-12-05T20:54:20.827Z"
>  },
>  "stateOperators" : [ {
>"numRowsTotal" : 0,
>"numRowsUpdated" : 1
>  } ],
>  "sources" : [ {
>"description" : "source",
>"startOffset" : 123,
>"endOffset" : 456,
>"numInputRows" : 678,
>"inputRowsPerSecond" : 10.0
>  } ],
>  "sink" : {
>"description" : "sink"
>  }
>}" (StreamingQueryStatusAndProgressSuite.scala:36)
> {code}
> {code}
>  - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
>"{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" did not equal "{
>  "message" : "active",
>  "isDataAvailable" : true,
>  "isTriggerActive" : false
>}" (StreamingQueryStatusAndProgressSuite.scala:115)
>org.scalatest.exceptions.TestFailedException:
> {code}
> The reason is, {{pretty}} in {{org.json4s.pretty}} writes OS-dependent 
> newlines but the string defined in the tests are {{\n}}. This ends up with 
> test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-19022) Fix tests dependent on OS due to different newline characters

2016-12-29 Thread Hyukjin Kwon (JIRA)

Hyukjin Kwon created SPARK-19022:


 Summary: Fix tests dependent on OS due to different newline 
characters
 Key: SPARK-19022
 URL: https://issues.apache.org/jira/browse/SPARK-19022
 Project: Spark
  Issue Type: Test
  Components: Structured Streaming, Tests
Reporter: Hyukjin Kwon
Priority: Minor


There are two tests failing on Windows due to the different newlines.

{code}
 - StreamingQueryProgress - prettyJson *** FAILED *** (0 milliseconds)
   "{
 "id" : "39788670-6722-48b7-a248-df6ba08722ac",
 "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
 "name" : "myName",
 "timestamp" : "2016-12-05T20:54:20.827Z",
 "numInputRows" : 678,
 "inputRowsPerSecond" : 10.0,
 "durationMs" : {
   "total" : 0
 },
 "eventTime" : {
   "avg" : "2016-12-05T20:54:20.827Z",
   "max" : "2016-12-05T20:54:20.827Z",
   "min" : "2016-12-05T20:54:20.827Z",
   "watermark" : "2016-12-05T20:54:20.827Z"
 },
 "stateOperators" : [ {
   "numRowsTotal" : 0,
   "numRowsUpdated" : 1
 } ],
 "sources" : [ {
   "description" : "source",
   "startOffset" : 123,
   "endOffset" : 456,
   "numInputRows" : 678,
   "inputRowsPerSecond" : 10.0
 } ],
 "sink" : {
   "description" : "sink"
 }
   }" did not equal "{
 "id" : "39788670-6722-48b7-a248-df6ba08722ac",
 "runId" : "422282f1-3b81-4b47-a15d-82dda7e69390",
 "name" : "myName",
 "timestamp" : "2016-12-05T20:54:20.827Z",
 "numInputRows" : 678,
 "inputRowsPerSecond" : 10.0,
 "durationMs" : {
   "total" : 0
 },
 "eventTime" : {
   "avg" : "2016-12-05T20:54:20.827Z",
   "max" : "2016-12-05T20:54:20.827Z",
   "min" : "2016-12-05T20:54:20.827Z",
   "watermark" : "2016-12-05T20:54:20.827Z"
 },
 "stateOperators" : [ {
   "numRowsTotal" : 0,
   "numRowsUpdated" : 1
 } ],
 "sources" : [ {
   "description" : "source",
   "startOffset" : 123,
   "endOffset" : 456,
   "numInputRows" : 678,
   "inputRowsPerSecond" : 10.0
 } ],
 "sink" : {
   "description" : "sink"
 }
   }" (StreamingQueryStatusAndProgressSuite.scala:36)
{code}

{code}
 - StreamingQueryStatus - prettyJson *** FAILED *** (0 milliseconds)
   "{
 "message" : "active",
 "isDataAvailable" : true,
 "isTriggerActive" : false
   }" did not equal "{
 "message" : "active",
 "isDataAvailable" : true,
 "isTriggerActive" : false
   }" (StreamingQueryStatusAndProgressSuite.scala:115)
   org.scalatest.exceptions.TestFailedException:
{code}

The reason is, {{pretty}} in {{org.json4s.pretty}} writes OS-dependent newlines 
but the string defined in the tests are {{\n}}. This ends up with test failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18801) Support resolve a nested view

2016-12-29 Thread Jiang Xingbo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xingbo updated SPARK-18801:
-
Summary: Support resolve a nested view  (was: Add `View` operator to help 
resolve a view)

> Support resolve a nested view
> -
>
> Key: SPARK-18801
> URL: https://issues.apache.org/jira/browse/SPARK-18801
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Jiang Xingbo
>
> We should be able to resolve a nested view. The main advantage is that if you 
> update an underlying view, the current view also gets updated.
> The new approach should be compatible with older versions of SPARK/HIVE, that 
> means:
>   1. The new approach should be able to resolve the views that created by 
> older versions of SPARK/HIVE;
>   2. The new approach should be able to resolve the views that are 
> currently supported by SPARK SQL.
> The new approach mainly brings in the following changes:
>   1. Add a new operator called `View` to keep track of the CatalogTable 
> that describes the view, and the output attributes as well as the child of 
> the view;
>   2. Update the `ResolveRelations` rule to resolve the relations and 
> views, note that a nested view should be resolved correctly;
>   3. Add `AnalysisContext` to enable us to still support a view created 
> with CTE/Windows query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18801) Add `View` operator to help resolve a view

2016-12-29 Thread Jiang Xingbo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xingbo updated SPARK-18801:
-
Description: 
We should be able to resolve a nested view. The main advantage is that if you 
update an underlying view, the current view also gets updated.
The new approach should be compatible with older versions of SPARK/HIVE, that 
means:
1. The new approach should be able to resolve the views that created by 
older versions of SPARK/HIVE;
2. The new approach should be able to resolve the views that are 
currently supported by SPARK SQL.

The new approach mainly brings in the following changes:
1. Add a new operator called `View` to keep track of the CatalogTable 
that describes the view, and the output attributes as well as the child of the 
view;
2. Update the `ResolveRelations` rule to resolve the relations and 
views, note that a nested view should be resolved correctly;
3. Add `AnalysisContext` to enable us to still support a view created 
with CTE/Windows query.

  was:
We should be able to resolve a nested view. The main advantage is that if you 
update an underlying view, the current view also gets updated.
The new approach should be compatible with older versions of SPARK/HIVE, that 
means:
1. The new approach should be able to resolve the views that created by 
older versions of SPARK/HIVE;
2. The new approach should be able to resolve the views that are 
currently supported by SPARK SQL.

The new approach mainly brings in the following changes:
1. Add a new operator called `View` to keep track of the CatalogTable 
that descripts the view, and the output attributes as well as the child of the 
view;
2. Update the `ResolveRelations` rule to resolve the relations and 
views, note that a nested view should be resolved correctly;
3. Add `AnalysisContext` to enable us to still support a view created 
with CTE/Windows query.


> Add `View` operator to help resolve a view
> --
>
> Key: SPARK-18801
> URL: https://issues.apache.org/jira/browse/SPARK-18801
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Jiang Xingbo
>
> We should be able to resolve a nested view. The main advantage is that if you 
> update an underlying view, the current view also gets updated.
> The new approach should be compatible with older versions of SPARK/HIVE, that 
> means:
>   1. The new approach should be able to resolve the views that created by 
> older versions of SPARK/HIVE;
>   2. The new approach should be able to resolve the views that are 
> currently supported by SPARK SQL.
> The new approach mainly brings in the following changes:
>   1. Add a new operator called `View` to keep track of the CatalogTable 
> that describes the view, and the output attributes as well as the child of 
> the view;
>   2. Update the `ResolveRelations` rule to resolve the relations and 
> views, note that a nested view should be resolved correctly;
>   3. Add `AnalysisContext` to enable us to still support a view created 
> with CTE/Windows query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-18801) Add `View` operator to help resolve a view

2016-12-29 Thread Jiang Xingbo (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-18801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiang Xingbo updated SPARK-18801:
-
Description: 
We should be able to resolve a nested view. The main advantage is that if you 
update an underlying view, the current view also gets updated.
The new approach should be compatible with older versions of SPARK/HIVE, that 
means:
1. The new approach should be able to resolve the views that created by 
older versions of SPARK/HIVE;
2. The new approach should be able to resolve the views that are 
currently supported by SPARK SQL.

The new approach mainly brings in the following changes:
1. Add a new operator called `View` to keep track of the CatalogTable 
that descripts the view, and the output attributes as well as the child of the 
view;
2. Update the `ResolveRelations` rule to resolve the relations and 
views, note that a nested view should be resolved correctly;
3. Add `AnalysisContext` to enable us to still support a view created 
with CTE/Windows query.

  was:
We should add a new operator called `View` to keep track of the database name 
used on resolving a view. The analysis rule `ResolveRelations` should also be 
updated.
After that change, we should be able to resolve a nested view.


> Add `View` operator to help resolve a view
> --
>
> Key: SPARK-18801
> URL: https://issues.apache.org/jira/browse/SPARK-18801
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Reporter: Jiang Xingbo
>
> We should be able to resolve a nested view. The main advantage is that if you 
> update an underlying view, the current view also gets updated.
> The new approach should be compatible with older versions of SPARK/HIVE, that 
> means:
>   1. The new approach should be able to resolve the views that created by 
> older versions of SPARK/HIVE;
>   2. The new approach should be able to resolve the views that are 
> currently supported by SPARK SQL.
> The new approach mainly brings in the following changes:
>   1. Add a new operator called `View` to keep track of the CatalogTable 
> that descripts the view, and the output attributes as well as the child of 
> the view;
>   2. Update the `ResolveRelations` rule to resolve the relations and 
> views, note that a nested view should be resolved correctly;
>   3. Add `AnalysisContext` to enable us to still support a view created 
> with CTE/Windows query.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

100 matches

Mail list logo