[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060796#comment-17060796 ] Sivaprasanna Sethuraman edited comment on FLINK-10114 at 3/17/20, 10:19 AM: Hi [~gaoyunhaii] and [~kkl0u], Thanks for the response. I have worked on a rough cut which leverages the existing implementation of ORC's low level implementations such as PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It hasn't been tested on a production or stage equivalent cluster yet. Maybe I can commit the code and you guys can take a look and review whether we can go ahead with this design/approach? If things have to be changed either slightly or drastically, I am still willing to work on this gladly since we are pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? was (Author: zenfenan): Hi [~gaoyunhaii] and [~kkl0u], Thanks for the response. I have worked on a rough cut which leverages the existing implementation of ORC's low level implementations such as PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It hasn't been tested on a production or stage equivalent cluster yet. Maybe I can commit the code and you guys can take a look and review whether we can go ahead with this design/approach. If things have to be changed either slightly or drastically, I am still willing to work on this gladly since we are pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? > Support Orc for StreamingFileSink > - > > Key: FLINK-10114 > URL: https://issues.apache.org/jira/browse/FLINK-10114 > Project: Flink > Issue Type: Sub-task > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: vinoyang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17060796#comment-17060796 ] Sivaprasanna Sethuraman edited comment on FLINK-10114 at 3/17/20, 10:18 AM: Hi [~gaoyunhaii] and [~kkl0u], Thanks for the response. I have worked on a rough cut which leverages the existing implementation of ORC's low level implementations such as PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It hasn't been tested on a production or stage equivalent cluster yet. Maybe I can commit the code and you guys can take a look and review whether we can go ahead with this design/approach. If things have to be changed either slightly or drastically, I am still willing to work on this gladly since we are pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? was (Author: zenfenan): Hi [~gaoyunhaii] and [~kkl0u], Thanks for the response. I have worked on a rough cut which leverages the existing implementation of ORC's low level implementations such as PhysicalFsWriter. So far I have tested that on my end on a local cluster.. It hasn't been tested on a production or stage equivalent cluster yet. Maybe I can commit the code and you guys can take a look and review whether we can go ahead with this design/approach. If things have to be changed either slightly or drastically, I would still gladly work on this since we are pushing for a ORC bulkwriter for StreamingFileSink on our side. Thoughts? > Support Orc for StreamingFileSink > - > > Key: FLINK-10114 > URL: https://issues.apache.org/jira/browse/FLINK-10114 > Project: Flink > Issue Type: Sub-task > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: vinoyang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17054920#comment-17054920 ] Yun Gao edited comment on FLINK-10114 at 3/9/20, 1:01 PM: -- It seems currently _{{StreamingFileSink}}_ indeed requires the writer to be able to write to an {{_OutputStream_}} so that it could work with the recovery mechanism, but official ORC library does not support it. To overcome this problem, one option is to modify the current StreamingFileSink, but it seems to be a bit complex. There might be another option that seems to be simpler is that we might use [presto-orc|[https://github.com/prestodb/presto/tree/master/presto-orc]] directly, which enables writing to OutputStream. We could add a new shaded module to hide the dependency on presto-orc. was (Author: gaoyunhaii): It seems currently _{{StreamingFileSink}}_ requires the writer to be able to write to an {{_OutputStream_}} so that it could work with the recovery mechanism, but official ORC library does not support it. To overcome this problem, one option is to modify the current StreamingFileSink, but it is a bit complex. Another simpler option is that we might use [presto-orc|[https://github.com/prestodb/presto/tree/master/presto-orc]], which enables writing to OutputStream. We could add a new shaded module to hide the dependency on presto-orc. > Support Orc for StreamingFileSink > - > > Key: FLINK-10114 > URL: https://issues.apache.org/jira/browse/FLINK-10114 > Project: Flink > Issue Type: Sub-task > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: vinoyang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904323#comment-16904323 ] hiliuxg edited comment on FLINK-10114 at 8/10/19 5:04 AM: -- Hi,[~kien_truong]. I try , But the StreamingFileSink base api hard to implement the orc file sink . Thank you , I try again was (Author: hiliuxg): I try , But the StreamingFileSink base api hard to implement the orc file sink . Thank you , I try again > Support Orc for StreamingFileSink > - > > Key: FLINK-10114 > URL: https://issues.apache.org/jira/browse/FLINK-10114 > Project: Flink > Issue Type: Sub-task > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: vinoyang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (FLINK-10114) Support Orc for StreamingFileSink
[ https://issues.apache.org/jira/browse/FLINK-10114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16842336#comment-16842336 ] Manish Bellani edited comment on FLINK-10114 at 5/17/19 5:33 PM: - Hello, is this still planned to be supported for StreamingFileSink? It appears that the changes would be needed for orc-writer api in order to support this? I'd love to hear if there's work gonig on in this area since we write data in orc format and are looking for streaming orc data to s3. was (Author: mbellani): Hello, is this still planned to be supported for StreamingFileSink? It appears that the changes would be needed for orc-writer api in order to support this? I'd love to hear if there's work gonig on in this area since we write data in orc format and are looking for streaming orc data in s3. > Support Orc for StreamingFileSink > - > > Key: FLINK-10114 > URL: https://issues.apache.org/jira/browse/FLINK-10114 > Project: Flink > Issue Type: Sub-task > Components: Connectors / FileSystem >Reporter: zhangminglei >Assignee: vinoyang >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.3#76005)