[jira] (SPARK-19393) Add `approx_percentile` Dataset/DataFrame API

2017-01-28 Thread Liwei Lin (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Liwei Lin updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19393 
 
 
 
  Add `approx_percentile` Dataset/DataFrame API  
 
 
 
 
 
 
 
 
 

Change By:
 
 Liwei Lin 
 
 
 

Summary:
 
 Add `approx_percentile`  Dataframe  Dataset/DataFrame  API 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19393) Add `approx_percentile` Dataframe API

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Unassigned 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19393 
 
 
 
  Add `approx_percentile` Dataframe API  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19393) Add `approx_percentile` Dataframe API

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark commented on  SPARK-19393 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Add `approx_percentile` Dataframe API  
 
 
 
 
 
 
 
 
 
 
User 'lw-lin' has created a pull request for this issue: https://github.com/apache/spark/pull/16731 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19393) Add `approx_percentile` Dataframe API

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Apache Spark 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19393 
 
 
 
  Add `approx_percentile` Dataframe API  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19393) Add `approx_percentile` Dataframe API

2017-01-28 Thread Liwei Lin (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Liwei Lin created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19393 
 
 
 
  Add `approx_percentile` Dataframe API  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Improvement 
 
 
 

Assignee:
 

 Unassigned 
 
 
 

Components:
 

 SQL 
 
 
 

Created:
 

 29/Jan/17 07:46 
 
 
 

Priority:
 
  Minor 
 
 
 

Reporter:
 
 Liwei Lin 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark commented on  SPARK-19392 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 
 
User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/16733 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Apache Spark 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19392 
 
 
 
  Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Unassigned 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19392 
 
 
 
  Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Takeshi Yamamuro (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Takeshi Yamamuro updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19392 
 
 
 
  Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 

Change By:
 
 Takeshi Yamamuro 
 
 
 
 
 
 
 
 
 
 In OracleDialect, if you use Numeric types in `DataFrameWriter` with Oracle jdbc, this throws an exception below;{code}  java.util.NoSuchElementException: key not found: scale  at scala.collection.MapLike$class.default(MapLike.scala:228)  at scala.collection.AbstractMap.default(Map.scala:59)  at scala.collection.MapLike$class.apply(MapLike.scala:141){code} This ticket comes from https://www.mail-archive.com/user@spark.apache.org/msg61280.html. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Takeshi Yamamuro (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Takeshi Yamamuro updated an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19392 
 
 
 
  Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 

Change By:
 
 Takeshi Yamamuro 
 
 
 

Component/s:
 
 SQL 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19392) Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect

2017-01-28 Thread Takeshi Yamamuro (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Takeshi Yamamuro created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19392 
 
 
 
  Throw an exception "NoSuchElementException: key not found: scale" in OracleDialect  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Bug 
 
 
 

Affects Versions:
 

 2.1.0 
 
 
 

Assignee:
 

 Unassigned 
 
 
 

Created:
 

 29/Jan/17 06:32 
 
 
 

Priority:
 
  Minor 
 
 
 

Reporter:
 
 Takeshi Yamamuro 
 
 
 
 
 
 
 
 
 
 
In OracleDialect, if you use Numeric types in `DataFrameWriter` with Oracle jdbc, this throws an exception below; 

 

  java.util.NoSuchElementException: key not found: scale  at scala.collection.MapLike$class.default(MapLike.scala:228)
  at scala.collection.AbstractMap.default(Map.scala:59)  at scala.collection.MapLike$class.apply(MapLike.scala:141)
 

 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
  

[jira] (SPARK-19368) Very bad performance in BlockMatrix.toIndexedRowMatrix()

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Unassigned 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19368 
 
 
 
  Very bad performance in BlockMatrix.toIndexedRowMatrix()  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19368) Very bad performance in BlockMatrix.toIndexedRowMatrix()

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark assigned an issue to Apache Spark 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Spark /  SPARK-19368 
 
 
 
  Very bad performance in BlockMatrix.toIndexedRowMatrix()  
 
 
 
 
 
 
 
 
 

Change By:
 
 Apache Spark 
 
 
 

Assignee:
 
 Apache Spark 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19368) Very bad performance in BlockMatrix.toIndexedRowMatrix()

2017-01-28 Thread Apache Spark (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Apache Spark commented on  SPARK-19368 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Very bad performance in BlockMatrix.toIndexedRowMatrix()  
 
 
 
 
 
 
 
 
 
 
User 'uzadude' has created a pull request for this issue: https://github.com/apache/spark/pull/16732 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] (SPARK-19323) Upgrade breeze to 0.13

2017-01-28 Thread koert kuipers (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 koert kuipers commented on  SPARK-19323 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Upgrade breeze to 0.13  
 
 
 
 
 
 
 
 
 
 
i tried to compile spark with breeze 0.13-RC1. ran into breeze regression. posted it here: https://github.com/scalanlp/breeze/issues/621 
will create pullreq once i get spark to compile and pass tests against a new breeze RC. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.3.15#6346-sha1:dbc023d) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   



[jira] [Comment Edited] (SPARK-14709) spark.ml API for linear SVM

2017-01-28 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15835554#comment-15835554
 ] 

Felix Cheung edited comment on SPARK-14709 at 1/28/17 11:17 PM:


[~josephkb] should we add SparkR API as one follow up tasks? (I could shepherd 
that)


was (Author: felixcheung):
[~josephkb] should we add SparR API as one follow up tasks? (I could shepherd 
that)

> spark.ml API for linear SVM
> ---
>
> Key: SPARK-14709
> URL: https://issues.apache.org/jira/browse/SPARK-14709
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: yuhao yang
> Fix For: 2.2.0
>
>
> Provide API for SVM algorithm for DataFrames.  I would recommend using 
> OWL-QN, rather than wrapping spark.mllib's SGD-based implementation.
> The API should mimic existing spark.ml.classification APIs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19391) Tweedie GLM API in SparkR

2017-01-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844180#comment-15844180
 ] 

Apache Spark commented on SPARK-19391:
--

User 'actuaryzhang' has created a pull request for this issue:
https://github.com/apache/spark/pull/16729

> Tweedie GLM API in SparkR
> -
>
> Key: SPARK-19391
> URL: https://issues.apache.org/jira/browse/SPARK-19391
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Wayne Zhang
>
> Port Tweedie GLM to SparkR
> https://github.com/apache/spark/pull/16344



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19391) Tweedie GLM API in SparkR

2017-01-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19391:


Assignee: (was: Apache Spark)

> Tweedie GLM API in SparkR
> -
>
> Key: SPARK-19391
> URL: https://issues.apache.org/jira/browse/SPARK-19391
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Wayne Zhang
>
> Port Tweedie GLM to SparkR
> https://github.com/apache/spark/pull/16344



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-19391) Tweedie GLM API in SparkR

2017-01-28 Thread Apache Spark (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-19391:


Assignee: Apache Spark

> Tweedie GLM API in SparkR
> -
>
> Key: SPARK-19391
> URL: https://issues.apache.org/jira/browse/SPARK-19391
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Wayne Zhang
>Assignee: Apache Spark
>
> Port Tweedie GLM to SparkR
> https://github.com/apache/spark/pull/16344



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19359) partition path created by Hive should be deleted after rename a partition with upper-case

2017-01-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844173#comment-15844173
 ] 

Apache Spark commented on SPARK-19359:
--

User 'gatorsmile' has created a pull request for this issue:
https://github.com/apache/spark/pull/16728

> partition path created by Hive should be deleted after rename a partition 
> with upper-case
> -
>
> Key: SPARK-19359
> URL: https://issues.apache.org/jira/browse/SPARK-19359
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Song Jun
>Assignee: Song Jun
>Priority: Minor
> Fix For: 2.2.0
>
>
> Hive metastore is not case preserving and keep partition columns with lower 
> case names. 
> If SparkSQL create a table with upper-case partion name use 
> HiveExternalCatalog, when we rename partition, it first call the HiveClient 
> to renamePartition, which will create a new lower case partition path, then 
> SparkSql rename the lower case path to the upper-case.
> while if the renamed partition contains more than one depth partition ,e.g. 
> A=1/B=2, hive renamePartition change to a=1/b=2, then SparkSql rename it to 
> A=1/B=2, but the a=1 still exists in the filesystem, we should also delete it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-19391) Tweedie GLM API in SparkR

2017-01-28 Thread Wayne Zhang (JIRA)
Wayne Zhang created SPARK-19391:
---

 Summary: Tweedie GLM API in SparkR
 Key: SPARK-19391
 URL: https://issues.apache.org/jira/browse/SPARK-19391
 Project: Spark
  Issue Type: Improvement
  Components: SparkR
Reporter: Wayne Zhang


Port Tweedie GLM to SparkR
https://github.com/apache/spark/pull/16344



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11075) Spark SQL Thrift Server authentication issue on kerberized yarn cluster

2017-01-28 Thread Himangshu Borah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844070#comment-15844070
 ] 

Himangshu Borah commented on SPARK-11075:
-

This issue is not resolved. Found the same in spark 1.6.2. In a kerberos 
environment, where the spark-thrift and hiveServer2 processes are running 
through a user (User "hive" in my case), any command executed through the 
thrift is getting executed by that user("hive" in my case). But we are trying 
to impersonate the request as another user "Buser" as the table used in the 
query has access to "Buser" only.

How I am using -
beeline> !connect 
jdbc:hive2://:/default;principal=hive/something@something.com;hive.server2.proxy.user=Buser;

And executing a select command on an existing table. The location for table 
have permission like -
Buser:hdfs:drwx-- (700 permission for the owner only)

Getting response -
Error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
example_table. org.apache.hadoop.security.AccessControlException: Permission 
denied: user=hive, access=EXECUTE, 
inode="/apps/hive/warehouse/some.db/example_table":Buser:hdfs:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)

But same query is executing fine if we use the hive-thrift.
The spark thrift is not respecting the property property 
hive.server2.proxy.user=Buser; and trying to execute the query with the user 
owning the spark-thrift process.

> Spark SQL Thrift Server authentication issue on kerberized yarn cluster 
> 
>
> Key: SPARK-11075
> URL: https://issues.apache.org/jira/browse/SPARK-11075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0, 1.5.1
> Environment: hive-1.2.1
> hadoop-2.6.0 config kerbers
>Reporter: Xiaoyu Wang
>
> Use proxy user connect to the thrift server by beeline but got permission 
> exception:
> 1.Start the hive 1.2.1 metastore with user hive
> {code}
> $kinit -kt /tmp/hive.keytab hive/xxx
> $nohup ./hive --service metastore 2>&1 >> ../logs/metastore.log &
> {code}
> 2.Start the spark thrift server with user hive
> {code}
> $kinit -kt /tmp/hive.keytab hive/xxx
> $./start-thriftserver.sh --master yarn
> {code}
> 3.Connect to the thrift server with proxy user hive01
> {code}
> $kinit hive01
> beeline command:!connect 
> jdbc:hive2://xxx:1/default;principal=hive/x...@hadoop.com;kerberosAuthType=kerberos;hive.server2.proxy.user=hive01
> {code}
> 4.Create table and insert data
> {code}
> create table test(name string);
> insert overwrite table test select * from sometable;
> {code}
> the insert sql got exception:
> {noformat}
> Error: org.apache.hadoop.security.AccessControlException: Permission denied: 
> user=hive01, access=WRITE, 
> inode="/user/hive/warehouse/test/.hive-staging_hive_2015-10-10_09-17-15_972_3267668540808140587-2/-ext-1/_temporary/0/task_201510100917_0003_m_00":hive:hadoop:drwxr-xr-x
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:271)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:257)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:238)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:182)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInternal(FSNamesystem.java:3805)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameToInt(FSNamesystem.java:3775)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renameTo(FSNamesystem.java:3739)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rename(NameNodeRpcServer.java:754)
> at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.rename(ClientNamenodeProtocolServerSideTranslatorPB.java:565)
> at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
> at 

[jira] [Commented] (SPARK-5159) Thrift server does not respect hive.server2.enable.doAs=true

2017-01-28 Thread Himangshu Borah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15844068#comment-15844068
 ] 

Himangshu Borah commented on SPARK-5159:


This issue is not resolved. Found the same in spark 1.6.2. In a kerberos 
environment, where the spark-thrift and hiveServer2 processes are running 
through a user (User "hive" in my case), any command executed through the 
thrift is getting executed by that user("hive" in my case). But we are trying 
to impersonate the request as another user "Buser" as the table used in the 
query has access to "Buser" only.

How I am using -
beeline> !connect 
jdbc:hive2://:/default;principal=hive/something@something.com;hive.server2.proxy.user=Buser;

And executing a select command on an existing table. The location for table 
have permission like -
Buser:hdfs:drwx-- (700 permission for the owner only)

Getting response -
Error: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table 
example_table. org.apache.hadoop.security.AccessControlException: Permission 
denied: user=hive, access=EXECUTE, 
inode="/apps/hive/warehouse/some.db/example_table":Buser:hdfs:drwx--
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkTraverse(FSPermissionChecker.java:259)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:205)

But same query is executing fine if we use the hive-thrift.
The spark thrift is not respecting the property property 
hive.server2.proxy.user=Buser; and trying to execute the query with the user 
owning the spark-thrift process.

> Thrift server does not respect hive.server2.enable.doAs=true
> 
>
> Key: SPARK-5159
> URL: https://issues.apache.org/jira/browse/SPARK-5159
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andrew Ray
> Attachments: spark_thrift_server_log.txt
>
>
> I'm currently testing the spark sql thrift server on a kerberos secured 
> cluster in YARN mode. Currently any user can access any table regardless of 
> HDFS permissions as all data is read as the hive user. In HiveServer2 the 
> property hive.server2.enable.doAs=true causes all access to be done as the 
> submitting user. We should do the same.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-18781) Allow MatrixFactorizationModel.predict to skip user/product approximation count

2017-01-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-18781.
---
Resolution: Won't Fix

> Allow MatrixFactorizationModel.predict to skip user/product approximation 
> count
> ---
>
> Key: SPARK-18781
> URL: https://issues.apache.org/jira/browse/SPARK-18781
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Eyal Allweil
>Priority: Minor
>
> When 
> [MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)]
>  is used, it first calculates an approximation count of the users and 
> products in order to determine the most efficient way to proceed. In many 
> cases, the answer to this question is fixed (typically there are more users 
> than products by an order of magnitude) and this check is unnecessary. Adding 
> a parameter to this predict method to allow choosing the implementation (and 
> skipping the check) would be nice.
> It would be especially nice in development cycles when you are repeatedly 
> tweaking your model and which pairs you're predicting for and this 
> approximate count represents a meaningful portion of the time you wait for 
> results.
> I can provide a pull request with this ability added that preserves the 
> existing behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-14623) add label binarizer

2017-01-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-14623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-14623.
---
Resolution: Won't Fix

> add label binarizer 
> 
>
> Key: SPARK-14623
> URL: https://issues.apache.org/jira/browse/SPARK-14623
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: hujiayin
>Priority: Minor
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> It relates to https://issues.apache.org/jira/browse/SPARK-7445
> Map the labels to 0/1. 
> For example,
> Input:
> "yellow,green,red,green,0"
> The labels: "0, green, red, yellow"
> Output:
> 0, 0, 0, 1
> 0, 1, 0, 0
> 0, 0, 1, 0
> 0, 1, 0, 0
> 1, 0 ,0, 0
> Refer to 
> http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelBinarizer.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19384) forget unpersist input dataset in IsotonicRegression

2017-01-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen updated SPARK-19384:
--
Assignee: zhengruifeng

> forget unpersist input dataset in IsotonicRegression
> 
>
> Key: SPARK-19384
> URL: https://issues.apache.org/jira/browse/SPARK-19384
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Trivial
> Fix For: 2.2.0
>
>
> forget unpersist input dataset in IsotonicRegression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19384) forget unpersist input dataset in IsotonicRegression

2017-01-28 Thread Sean Owen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen resolved SPARK-19384.
---
   Resolution: Fixed
Fix Version/s: 2.2.0

Issue resolved by pull request 16718
[https://github.com/apache/spark/pull/16718]

> forget unpersist input dataset in IsotonicRegression
> 
>
> Key: SPARK-19384
> URL: https://issues.apache.org/jira/browse/SPARK-19384
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: zhengruifeng
>Priority: Trivial
> Fix For: 2.2.0
>
>
> forget unpersist input dataset in IsotonicRegression



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19364) Stream Blocks in Storage Persists Forever when Kinesis Checkpoints are enabled and an exception is thrown

2017-01-28 Thread Liwei Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liwei Lin updated SPARK-19364:
--
Component/s: (was: Spark Core)
 DStreams

> Stream Blocks in Storage Persists Forever when Kinesis Checkpoints are 
> enabled and an exception is thrown 
> --
>
> Key: SPARK-19364
> URL: https://issues.apache.org/jira/browse/SPARK-19364
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams
>Affects Versions: 2.0.2
> Environment: ubuntu unix
> spark 2.0.2
> application is java
>Reporter: Andrew Milkowski
>Priority: Blocker
>
> -- update --- we found that below situation occurs when we encounter
> "com.amazonaws.services.kinesis.clientlibrary.exceptions.ShutdownException: 
> Can't update checkpoint - instance doesn't hold the lease for this shard"
> https://github.com/awslabs/amazon-kinesis-client/issues/108
> we use s3 directory (and dynamodb) to store checkpoints, but if such occurs 
> blocks should not get stuck but continue to be evicted gracefully from 
> memory, obviously kinesis library race condition is a problem onto itself...
> -- exception leading to a block not being freed up --
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/mnt/yarn/usercache/hadoop/filecache/24/__spark_libs__7928020266533182031.zip/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 17/01/26 13:52:00 ERROR KinesisRecordProcessor: ShutdownException:  Caught 
> shutdown exception, skipping checkpoint.
> com.amazonaws.services.kinesis.clientlibrary.exceptions.ShutdownException: 
> Can't update checkpoint - instance doesn't hold the lease for this shard
>   at 
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.KinesisClientLibLeaseCoordinator.setCheckpoint(KinesisClientLibLeaseCoordinator.java:120)
>   at 
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.RecordProcessorCheckpointer.advancePosition(RecordProcessorCheckpointer.java:216)
>   at 
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.RecordProcessorCheckpointer.checkpoint(RecordProcessorCheckpointer.java:137)
>   at 
> com.amazonaws.services.kinesis.clientlibrary.lib.worker.RecordProcessorCheckpointer.checkpoint(RecordProcessorCheckpointer.java:103)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$checkpoint$1$$anonfun$apply$1.apply$mcV$sp(KinesisCheckpointer.scala:81)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$checkpoint$1$$anonfun$apply$1.apply(KinesisCheckpointer.scala:81)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$checkpoint$1$$anonfun$apply$1.apply(KinesisCheckpointer.scala:81)
>   at scala.util.Try$.apply(Try.scala:192)
>   at 
> org.apache.spark.streaming.kinesis.KinesisRecordProcessor$.retryRandom(KinesisRecordProcessor.scala:144)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$checkpoint$1.apply(KinesisCheckpointer.scala:81)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$checkpoint$1.apply(KinesisCheckpointer.scala:75)
>   at scala.Option.foreach(Option.scala:257)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer.checkpoint(KinesisCheckpointer.scala:75)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer.org$apache$spark$streaming$kinesis$KinesisCheckpointer$$checkpointAll(KinesisCheckpointer.scala:103)
>   at 
> org.apache.spark.streaming.kinesis.KinesisCheckpointer$$anonfun$1.apply$mcVJ$sp(KinesisCheckpointer.scala:117)
>   at 
> org.apache.spark.streaming.util.RecurringTimer.triggerActionForNextInterval(RecurringTimer.scala:94)
>   at 
> org.apache.spark.streaming.util.RecurringTimer.org$apache$spark$streaming$util$RecurringTimer$$loop(RecurringTimer.scala:106)
>   at 
> org.apache.spark.streaming.util.RecurringTimer$$anon$1.run(RecurringTimer.scala:29)
> running standard kinesis stream ingestion with a java spark app and creating 
> dstream after running for some time some block streams seem to persist 
> forever and never cleaned up and this eventually leads to memory depletion on 
> workers
> we even tried cleaning RDD's with the following:
> cleaner = ssc.sparkContext().sc().cleaner().get();
> filtered.foreachRDD(new VoidFunction() {
> @Override
> public void call(JavaRDD rdd) throws Exception {
>

[jira] [Commented] (SPARK-14098) Generate code that get a float/double value in each column from CachedBatch when DataFrame.cache() is called

2017-01-28 Thread Shuai Lin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843996#comment-15843996
 ] 

Shuai Lin commented on SPARK-14098:
---

[~kiszk] Seems the title/description of this ticket is not on par with what is 
done in https://github.com/apache/spark/pull/15219 . Should we update the 
title/description here?

> Generate code that get a float/double value in each column from CachedBatch 
> when DataFrame.cache() is called
> 
>
> Key: SPARK-14098
> URL: https://issues.apache.org/jira/browse/SPARK-14098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Kazuaki Ishizaki
>
> When DataFrame.cache() is called, data is stored as column-oriented storage 
> in CachedBatch. The current Catalyst generates Java program to get a value of 
> a column from an InternalRow that is translated from CachedBatch. This issue 
> generates Java code to get a value of a column from CachedBatch. While a 
> column for a cache may be compressed, this issue handles float and double 
> types that are never compressed. 
> Other primitive types, whose column may be compressed, will be addressed in 
> another entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19336) LinearSVC Python API

2017-01-28 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15843983#comment-15843983
 ] 

Apache Spark commented on SPARK-19336:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/16727

> LinearSVC Python API
> 
>
> Key: SPARK-19336
> URL: https://issues.apache.org/jira/browse/SPARK-19336
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Reporter: Joseph K. Bradley
>Assignee: Miao Wang
> Fix For: 2.2.0
>
>
> Create a Python wrapper for spark.ml.classification.LinearSVC



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org