[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2020-03-17 Thread Sivaprasanna Sethuraman (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060796#comment-17060796
 ] 

Sivaprasanna Sethuraman edited comment on FLINK-10114 at 3/17/20, 10:19 AM:


Hi [~gaoyunhaii] and [~kkl0u],

Thanks for the response. I have worked on a rough cut which leverages the 
existing implementation of ORC's low level implementations such as 
PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It 
hasn't been tested on a production or stage equivalent cluster yet.

Maybe I can commit the code and you guys can take a look and review whether we 
can go ahead with this design/approach? If things have to be changed either 
slightly or drastically, I am still willing to work on this gladly since we are 
pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? 


was (Author: zenfenan):
Hi [~gaoyunhaii] and [~kkl0u],

Thanks for the response. I have worked on a rough cut which leverages the 
existing implementation of ORC's low level implementations such as 
PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It 
hasn't been tested on a production or stage equivalent cluster yet.

Maybe I can commit the code and you guys can take a look and review whether we 
can go ahead with this design/approach. If things have to be changed either 
slightly or drastically, I am still willing to work on this gladly since we are 
pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2020-03-17 Thread Sivaprasanna Sethuraman (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060796#comment-17060796
 ] 

Sivaprasanna Sethuraman edited comment on FLINK-10114 at 3/17/20, 10:18 AM:


Hi [~gaoyunhaii] and [~kkl0u],

Thanks for the response. I have worked on a rough cut which leverages the 
existing implementation of ORC's low level implementations such as 
PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It 
hasn't been tested on a production or stage equivalent cluster yet.

Maybe I can commit the code and you guys can take a look and review whether we 
can go ahead with this design/approach. If things have to be changed either 
slightly or drastically, I am still willing to work on this gladly since we are 
pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? 


was (Author: zenfenan):
Hi [~gaoyunhaii] and [~kkl0u],

Thanks for the response. I have worked on a rough cut which leverages the 
existing implementation of ORC's low level implementations such as 
PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It 
hasn't been tested on a production or stage equivalent cluster yet.

Maybe I can commit the code and you guys can take a look and review whether we 
can go ahead with this design/approach. If things have to be changed either 
slightly or drastically, I would still gladly work on this since we are pushing 
for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2020-03-09 Thread Yun Gao (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054920#comment-17054920
 ] 

Yun Gao edited comment on FLINK-10114 at 3/9/20, 1:01 PM:
--

It seems currently _{{StreamingFileSink}}_ indeed requires the writer to be 
able to write to an {{_OutputStream_}} so that it could work with the recovery 
mechanism, but official ORC library does not support it. 

To overcome this problem, one option is to modify the current 
StreamingFileSink, but it seems to be a bit complex. There might be another 
option that seems to be simpler is that we might use 
[presto-orc|[https://github.com/prestodb/presto/tree/master/presto-orc]] 
directly, which enables writing to OutputStream.  We could add a new shaded 
module to hide the dependency on presto-orc. 


was (Author: gaoyunhaii):
It seems currently _{{StreamingFileSink}}_ requires the writer to be able to 
write to an {{_OutputStream_}} so that it could work with the recovery 
mechanism, but official ORC library does not support it.

To overcome this problem, one option is to modify the current 
StreamingFileSink, but it is a bit complex. Another simpler option is that we 
might use 
[presto-orc|[https://github.com/prestodb/presto/tree/master/presto-orc]], which 
enables writing to OutputStream.  We could add a new shaded module to hide the 
dependency on presto-orc. 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2019-08-09 Thread hiliuxg (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904323#comment-16904323
 ] 

hiliuxg edited comment on FLINK-10114 at 8/10/19 5:04 AM:
--

Hi,[~kien_truong]. I try , But the StreamingFileSink base api hard to implement 
the orc file sink . Thank you , I try again


was (Author: hiliuxg):
I try , But the StreamingFileSink base api hard to implement the orc file sink 
. Thank you , I try again

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink

2019-05-17 Thread Manish Bellani (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842336#comment-16842336
 ] 

Manish Bellani edited comment on FLINK-10114 at 5/17/19 5:33 PM:
-

Hello, is this still planned to be supported for StreamingFileSink? It appears 
that the changes would be needed for orc-writer api in order to support this? 
I'd love to hear if there's work gonig on in this area since we write data in 
orc format and are looking for streaming orc data to s3. 


was (Author: mbellani):
Hello, is this still planned to be supported for StreamingFileSink? It appears 
that the changes would be needed for orc-writer api in order to support this? 
I'd love to hear if there's work gonig on in this area since we write data in 
orc format and are looking for streaming orc data in s3. 

> Support Orc for StreamingFileSink
> -
>
> Key: FLINK-10114
> URL: https://issues.apache.org/jira/browse/FLINK-10114
> Project: Flink
>  Issue Type: Sub-task
>  Components: Connectors / FileSystem
>Reporter: zhangminglei
>Assignee: vinoyang
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)