[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In [SPARK-38234|https://issues.apache.org/jira/browse/SPARK-38234] 
{-}{-}Structured Streaming is added to the history server and a "Structured 
Streaming" tab appears in the history UI when a streaming query is present. 
However, even though a store exists for it and the data is presented in the UI, 
this data is not exposed as a REST API. This data can be used for monitoring, 
detecting streaming and to build custom dashboards. This monitoring API will be 
similar to the monitoring APIs that are present for DStreams - refer 
[SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Attached sample I/O and screenshots.}}{}}}

  was:
In SPARK-31953 Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Attached sample I/O and screenshots.{{{}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
> Attachments: StreamingAPI-SS1.jpg, StreamingAPI-SS2.jpg, 
> StreamingAPIsSampleIO.txt
>
>
> In [SPARK-38234|https://issues.apache.org/jira/browse/SPARK-38234] 
> {-}{-}Structured Streaming is added to the history server and a "Structured 
> Streaming" tab appears in the history UI when a streaming query is present. 
> However, even though a store exists for it and the data is presented in the 
> UI, this data is not exposed as a REST API. This data can be used for 
> monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer 
> [SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In SPARK-31953 Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Attached sample I/O and screenshots.{{{}{}}}

  was:
In [SPARK-31953|https://issues.apache.org/jira/browse/SPARK-31953] Structured 
Streaming is added to the history server and a "Structured Streaming" tab 
appears in the history UI when a streaming query is present. However, even 
though a store exists for it and the data is presented in the UI, this data is 
not exposed as a REST API. This data can be used for monitoring, detecting 
streaming and to build custom dashboards. This monitoring API will be similar 
to the monitoring APIs that are present for DStreams - refer 
[SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
> Attachments: StreamingAPI-SS1.jpg, StreamingAPI-SS2.jpg, 
> StreamingAPIsSampleIO.txt
>
>
> In SPARK-31953 Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer SPARK-18470.
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we 

[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Attachment: StreamingAPI-SS1.jpg
StreamingAPI-SS2.jpg

> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
> Attachments: StreamingAPI-SS1.jpg, StreamingAPI-SS2.jpg, 
> StreamingAPIsSampleIO.txt
>
>
> In [SPARK-31953|https://issues.apache.org/jira/browse/SPARK-31953] Structured 
> Streaming is added to the history server and a "Structured Streaming" tab 
> appears in the history UI when a streaming query is present. However, even 
> though a store exists for it and the data is presented in the UI, this data 
> is not exposed as a REST API. This data can be used for monitoring, detecting 
> streaming and to build custom dashboards. This monitoring API will be similar 
> to the monitoring APIs that are present for DStreams - refer 
> [SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we are just 
> returning the data from the store without aggregation, these are existing 
> event structures.
> Will attach sample I/O.
> }}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Attachment: StreamingAPIsSampleIO.txt

> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
> Attachments: StreamingAPIsSampleIO.txt
>
>
> In [SPARK-31953|https://issues.apache.org/jira/browse/SPARK-31953] Structured 
> Streaming is added to the history server and a "Structured Streaming" tab 
> appears in the history UI when a streaming query is present. However, even 
> though a store exists for it and the data is presented in the UI, this data 
> is not exposed as a REST API. This data can be used for monitoring, detecting 
> streaming and to build custom dashboards. This monitoring API will be similar 
> to the monitoring APIs that are present for DStreams - refer 
> [SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we are just 
> returning the data from the store without aggregation, these are existing 
> event structures.
> Will attach sample I/O.
> }}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In [SPARK-31953|https://issues.apache.org/jira/browse/SPARK-31953] Structured 
Streaming is added to the history server and a "Structured Streaming" tab 
appears in the history UI when a streaming query is present. However, even 
though a store exists for it and the data is presented in the UI, this data is 
not exposed as a REST API. This data can be used for monitoring, detecting 
streaming and to build custom dashboards. This monitoring API will be similar 
to the monitoring APIs that are present for DStreams - refer 
[SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}

  was:
In [SPARK-31953 Add Spark Structured Streaming History Server Support - ASF 
JIRA (apache.org)] Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer [SPARK-18470 Provide Spark Streaming Monitor Rest Api - ASF JIRA 
(apache.org)].

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
>
> In [SPARK-31953|https://issues.apache.org/jira/browse/SPARK-31953] Structured 
> Streaming is added to the history server and a "Structured Streaming" tab 
> appears in the history UI when a streaming query is present. However, even 
> though a store exists for it and the data is presented in the UI, this data 
> is not exposed as a REST API. This data can be used for monitoring, detecting 
> streaming and to build custom dashboards. This monitoring API will be similar 
> to the monitoring APIs that are present for DStreams - refer 
> [SPARK-18470|https://issues.apache.org/jira/browse/SPARK-18470].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET 

[jira] [Comment Edited] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493506#comment-17493506
 ] 

Karthik Subramanian edited comment on SPARK-38234 at 2/16/22, 9:04 PM:
---

Thanks. I will send the PR soon.


was (Author: JIRAUSER285342):
I will send the PR soon.

> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
>
> In [SPARK-31953 Add Spark Structured Streaming History Server Support - ASF 
> JIRA (apache.org)] Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer [SPARK-18470 Provide Spark Streaming Monitor Rest Api - ASF 
> JIRA (apache.org)].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we are just 
> returning the data from the store without aggregation, these are existing 
> event structures.
> Will attach sample I/O.
> }}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17493506#comment-17493506
 ] 

Karthik Subramanian commented on SPARK-38234:
-

I will send the PR soon.

> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
>
> In [SPARK-31953 Add Spark Structured Streaming History Server Support - ASF 
> JIRA (apache.org)] Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer [SPARK-18470 Provide Spark Streaming Monitor Rest Api - ASF 
> JIRA (apache.org)].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we are just 
> returning the data from the store without aggregation, these are existing 
> event structures.
> Will attach sample I/O.
> }}{}}}}}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In [SPARK-31953 Add Spark Structured Streaming History Server Support - ASF 
JIRA (apache.org)] Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer [SPARK-18470 Provide Spark Streaming Monitor Rest Api - ASF JIRA 
(apache.org)].

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}

  was:
In [SPARK-31953] Add Spark Structured Streaming History Server Support - ASF 
JIRA (apache.org) Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer [SPARK-18470] Provide Spark Streaming Monitor Rest Api - ASF JIRA 
(apache.org).

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.3.0
>Reporter: Karthik Subramanian
>Priority: Major
>
> In [SPARK-31953 Add Spark Structured Streaming History Server Support - ASF 
> JIRA (apache.org)] Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer [SPARK-18470 Provide Spark Streaming Monitor Rest Api - ASF 
> JIRA (apache.org)].
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 

[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In [SPARK-31953] Add Spark Structured Streaming History Server Support - ASF 
JIRA (apache.org) Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer [SPARK-18470] Provide Spark Streaming Monitor Rest Api - ASF JIRA 
(apache.org).

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}

  was:
In SPARK-31953, Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Karthik Subramanian
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> In [SPARK-31953] Add Spark Structured Streaming History Server Support - ASF 
> JIRA (apache.org) Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer [SPARK-18470] Provide Spark Streaming Monitor Rest Api - ASF 
> JIRA (apache.org).
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET 

[jira] [Updated] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-38234:

Description: 
In SPARK-31953, Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

}}{}}}}}{}}}

  was:
In [SPARK-31953], Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{/\{appId}/sql/streamingqueries/\{runId}/progress?last=\{N}}}

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

{{{}{}}}{{{}{}}}


> Provide monitoring REST API for Structured Streaming
> 
>
> Key: SPARK-38234
> URL: https://issues.apache.org/jira/browse/SPARK-38234
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.2, 3.2.0
>Reporter: Karthik Subramanian
>Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> In SPARK-31953, Structured Streaming is added to the history server and a 
> "Structured Streaming" tab appears in the history UI when a streaming query 
> is present. However, even though a store exists for it and the data is 
> presented in the UI, this data is not exposed as a REST API. This data can be 
> used for monitoring, detecting streaming and to build custom dashboards. This 
> monitoring API will be similar to the monitoring APIs that are present for 
> DStreams - refer SPARK-18470.
> In this change, we plan to add two simple APIs that expose the data in the 
> store and can be used to monitor streaming queries. 
> h3. *Summary API*
> To list the summary of all existing streaming queries.
> GET {{/\{appId}/sql/streamingqueries}}
> Response is list of {_}StreamingQueryData{_}.
> h3. *Progress API*
> To list the progress events of a specific streaming query by {_}runId{_}. 
> User can also specify how many of the most recent events needs to be 
> retrieved by using the _last_ query parameter. By default, we can return the 
> most recent progress event i.e. last is set to 1.
> GET {{{}/\{appId}/sql/streamingqueries/\{runId}/progress?last={N{
> Response is list of {_}StreamingQueryProgress{_}.
> *Note:* We are not introducing new objects for the response since we are just 
> returning the data from the store without aggregation, these are existing 
> event structures.
> Will attach sample I/O.
> 

[jira] [Created] (SPARK-38234) Provide monitoring REST API for Structured Streaming

2022-02-16 Thread Karthik Subramanian (Jira)
Karthik Subramanian created SPARK-38234:
---

 Summary: Provide monitoring REST API for Structured Streaming
 Key: SPARK-38234
 URL: https://issues.apache.org/jira/browse/SPARK-38234
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.2.0, 3.1.2
Reporter: Karthik Subramanian
 Fix For: 3.3.0, 3.1.4, 3.2.2


In [SPARK-31953], Structured Streaming is added to the history server and a 
"Structured Streaming" tab appears in the history UI when a streaming query is 
present. However, even though a store exists for it and the data is presented 
in the UI, this data is not exposed as a REST API. This data can be used for 
monitoring, detecting streaming and to build custom dashboards. This monitoring 
API will be similar to the monitoring APIs that are present for DStreams - 
refer SPARK-18470.

In this change, we plan to add two simple APIs that expose the data in the 
store and can be used to monitor streaming queries. 
h3. *Summary API*

To list the summary of all existing streaming queries.

GET {{/\{appId}/sql/streamingqueries}}

Response is list of {_}StreamingQueryData{_}.
h3. *Progress API*

To list the progress events of a specific streaming query by {_}runId{_}. 

User can also specify how many of the most recent events needs to be retrieved 
by using the _last_ query parameter. By default, we can return the most recent 
progress event i.e. last is set to 1.

GET {{/\{appId}/sql/streamingqueries/\{runId}/progress?last=\{N}}}

Response is list of {_}StreamingQueryProgress{_}.

*Note:* We are not introducing new objects for the response since we are just 
returning the data from the store without aggregation, these are existing event 
structures.

Will attach sample I/O.

{{{}{}}}{{{}{}}}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4

2015-06-22 Thread Karthik Subramanian (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14597179#comment-14597179
 ] 

Karthik Subramanian commented on SPARK-8427:


Hi Michael,
Setting spark.sql.hive.convertMetastoreParquet to false is NOT helping. I am 
still facing the same issue.

 Incorrect ACL checking for partitioned table in Spark SQL-1.4
 -

 Key: SPARK-8427
 URL: https://issues.apache.org/jira/browse/SPARK-8427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: CentOS 6  OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 
 2.6.0
Reporter: Karthik Subramanian
Priority: Critical
  Labels: security

 Problem Statement:
 While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
 access denied exception is observed on the partition the user doesn’t belong 
 to (The user permission is controlled using HDF ACLs). The same works 
 correctly in hive.
 Usercase: To address Multitenancy
 Consider a table containing multiple customers and each customer with 
 multiple facility. The table is partitioned by customer and facility. The 
 user belonging to on facility will not have access to other facility. This is 
 enforced using HDFS ACLs on corresponding directories. When querying on the 
 table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular 
 partition (using ‘where’ clause) only the corresponding directory access 
 should be verified and not the entire table. 
 The above use case works as expected when using HIVE client, version 0.13.1  
 1.1.0. 
 The query used: select count(*) from customertable where customer=‘customer1’ 
 and facility=‘facility1’
 Below is the exception received in Spark-shell:
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=user1, access=READ_EXECUTE, 
 inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
   at 
 

[jira] [Created] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4

2015-06-17 Thread Karthik Subramanian (JIRA)
Karthik Subramanian created SPARK-8427:
--

 Summary: Incorrect ACL checking for partitioned table in Spark 
SQL-1.4
 Key: SPARK-8427
 URL: https://issues.apache.org/jira/browse/SPARK-8427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: CentOS 6, Hive-0.13.1, Spark-1.4, Hadoop 2.6.0
Reporter: Karthik Subramanian
Priority: Blocker


Problem Statement:
While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
access denied exception is observed on the partition the user doesn’t belong to 
(The user permission is controlled using HDF ACLs). The same works correctly in 
hive.

Usercase: To address Multitenancy

Consider a table containing multiple customers and each customer with multiple 
facility. The table is partitioned by customer and facility. The user belonging 
to on facility will not have access to other facility. This is enforced using 
HDFS ACLs on corresponding directories. When querying on the table as ‘user1’ 
belonging to ‘facility1’ and ‘customer1’ on the particular partition (using 
‘where’ clause) only the corresponding directory access should be verified and 
not the entire table. 
The above use case works as expected when using HIVE client, version 0.13.1  
1.1.0. 

The query used: select count(*) from customertable where customer=‘customer1’ 
and facility=‘facility1’

Below is the exception received in Spark-shell:

org.apache.hadoop.security.AccessControlException: Permission denied: 
user=user1, access=READ_EXECUTE, 
inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at 
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:755)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$15.doCall(DistributedFileSystem.java:751)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 

[jira] [Updated] (SPARK-8427) Incorrect ACL checking for partitioned table in Spark SQL-1.4

2015-06-17 Thread Karthik Subramanian (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Subramanian updated SPARK-8427:
---
Environment: CentOS 6  OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 2.6.0  
(was: CentOS 6, Hive-0.13.1, Spark-1.4, Hadoop 2.6.0)

 Incorrect ACL checking for partitioned table in Spark SQL-1.4
 -

 Key: SPARK-8427
 URL: https://issues.apache.org/jira/browse/SPARK-8427
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.4.0
 Environment: CentOS 6  OS X 10.9.5, Hive-0.13.1, Spark-1.4, Hadoop 
 2.6.0
Reporter: Karthik Subramanian
Priority: Blocker
  Labels: security

 Problem Statement:
 While doing query on a partitioned table using Spark SQL (Version 1.4.0), 
 access denied exception is observed on the partition the user doesn’t belong 
 to (The user permission is controlled using HDF ACLs). The same works 
 correctly in hive.
 Usercase: To address Multitenancy
 Consider a table containing multiple customers and each customer with 
 multiple facility. The table is partitioned by customer and facility. The 
 user belonging to on facility will not have access to other facility. This is 
 enforced using HDFS ACLs on corresponding directories. When querying on the 
 table as ‘user1’ belonging to ‘facility1’ and ‘customer1’ on the particular 
 partition (using ‘where’ clause) only the corresponding directory access 
 should be verified and not the entire table. 
 The above use case works as expected when using HIVE client, version 0.13.1  
 1.1.0. 
 The query used: select count(*) from customertable where customer=‘customer1’ 
 and facility=‘facility1’
 Below is the exception received in Spark-shell:
 org.apache.hadoop.security.AccessControlException: Permission denied: 
 user=user1, access=READ_EXECUTE, 
 inode=/data/customertable/customer=customer2/facility=facility2”:root:supergroup:drwxrwx---:group::r-x,group:facility2:rwx
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkAccessAcl(FSPermissionChecker.java:351)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:253)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:185)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6512)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6494)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPathAccess(FSNamesystem.java:6419)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListingInt(FSNamesystem.java:4954)
   at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getListing(FSNamesystem.java:4915)
   at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getListing(NameNodeRpcServer.java:826)
   at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getListing(ClientNamenodeProtocolServerSideTranslatorPB.java:612)
   at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
   at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1971)
   at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1952)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:693)
   at 
 org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:105)
   at