Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files
Thanks very much for the pointers Vinay. That helps ☺ -Raja. From: vinay patil <vinay18.pa...@gmail.com> Date: Monday, August 7, 2017 at 1:56 AM To: "user@flink.apache.org" <user@flink.apache.org> Subject: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files Hi Raja, That is why they are in the pending state. You can enable checkpointing by setting env.enableCheckpointing() After doing this they will not remain in pending state. Check this out : https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html Regards, Vinay Patil On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote: Hi Vinay, Thanks for the response. I have NOT enabled any checkpointing. Files are rolling out correctly for every 2mb, but the files are remaining as below: -rw-r--r-- 3 2097424 2017-08-06 21:10 ////Test/part-0-0.pending -rw-r--r-- 3 1431430 2017-08-06 21:12 ////Test/part-0-1.pending Regards, Raja. From: vinay patil <[hidden email]<http://user/SendEmail.jtp?type=node=14716=0>> Date: Sunday, August 6, 2017 at 10:40 PM To: "[hidden email]<http://user/SendEmail.jtp?type=node=14716=1>" <[hidden email]<http://user/SendEmail.jtp?type=node=14716=2>> Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files Hi Raja, Have you enabled checkpointing? The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time. Regards, Vinay Patil On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote: Hi, I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs. Below is the code I am using. Can someone pls help me identify the issue and help me fix this ? BucketingSink HdfsSink = new BucketingSink("hdfs://///Test/"); HdfsSink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB, HdfsSink.setInactiveBucketCheckInterval(1L); HdfsSink.setInactiveBucketThreshold(1L); Thanks a lot. Regards, Raja. If you reply to this email, your message will be added to the discussion below: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714.html To start a new topic under Apache Flink User Mailing List archive., email [hidden email] To unsubscribe from Apache Flink User Mailing List archive., click here. NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ____ View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html> Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14716.html To start a new topic under Apache Flink User Mailing List archive., email [hidden email] To unsubscribe from Apache Flink User Mailing List archive., click here. NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> View this message in context: Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14717.html> Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com.
Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files
Hi Raja, That is why they are in the pending state. You can enable checkpointing by setting env.enableCheckpointing() After doing this they will not remain in pending state. Check this out : https://ci.apache.org/projects/flink/flink-docs-release-1.3/api/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSink.html Regards, Vinay Patil On Mon, Aug 7, 2017 at 9:15 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <ml+s2336050n14716...@n4.nabble.com> wrote: > Hi Vinay, > > > > Thanks for the response. > > > > I have NOT enabled any checkpointing. > > > > Files are rolling out correctly for every 2mb, but the files are remaining > as below: > > > > -rw-r--r-- 3 2097424 2017-08-06 21:10 *///*/Test/part-0-0. > pending > > -rw-r--r-- 3 1431430 2017-08-06 21:12 *///*/Test/part-0-1. > pending > > > > > > Regards, > > Raja. > > > > *From: *vinay patil <[hidden email] > <http:///user/SendEmail.jtp?type=node=14716=0>> > *Date: *Sunday, August 6, 2017 at 10:40 PM > *To: *"[hidden email] > <http:///user/SendEmail.jtp?type=node=14716=1>" <[hidden email] > <http:///user/SendEmail.jtp?type=node=14716=2>> > *Subject: *[EXTERNAL] Re: Help required - "BucketingSink" usage to write > HDFS Files > > > > Hi Raja, > > Have you enabled checkpointing? > > The files will be rolled to complete state when the batch size is reached > (in your case 2 MB) or when the bucket is inactive for a certain amount of > time. > > > Regards, > > Vinay Patil > > > > On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User > Mailing List archive.] <[hidden email]> wrote: > > > > Hi, > > > > I am working on a poc to write to hdfs files using BucketingSink class. > Even thought I am the data is being writing to hdfs files, but the files > are lying with “.pending” on hdfs. > > > > > > Below is the code I am using. Can someone pls help me identify the issue > and help me fix this ? > > > > > > BucketingSink HdfsSink = *new *BucketingSink( > *"hdfs://///Test/"*); > > > > *HdfsSink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); > HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB, > HdfsSink.setInactiveBucketCheckInterval(1L); > HdfsSink.setInactiveBucketThreshold(1L);* > > > > > > Thanks a lot. > > > > > > Regards, > > Raja. > > > -- > > *If you reply to this email, your message will be added to the discussion > below:* > > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Help-required-BucketingSink-usage-to-write- > HDFS-Files-tp14714.html > > To start a new topic under Apache Flink User Mailing List archive., email > [hidden > email] > To unsubscribe from Apache Flink User Mailing List archive., click here. > NAML > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > > > > -- > > View this message in context: Re: Help required - "BucketingSink" usage > to write HDFS Files > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html> > Sent from the Apache Flink User Mailing List archive. mailing list archive > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at > Nabble.com. > > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Help-required-BucketingSink-usage-to-write- > HDFS-Files-tp14714p14716.html > To start a new topic under Apache Flink User Mailing List archive., email > ml+s2336050n1...@n4.nabble.com > To unsubscribe from Apache Flink User Mailing List archive., click here > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx> > . > NAML > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14717.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files
Hi Vinay, Thanks for the response. I have NOT enabled any checkpointing. Files are rolling out correctly for every 2mb, but the files are remaining as below: -rw-r--r-- 3 2097424 2017-08-06 21:10 ////Test/part-0-0.pending -rw-r--r-- 3 1431430 2017-08-06 21:12 ////Test/part-0-1.pending Regards, Raja. From: vinay patil <vinay18.pa...@gmail.com> Date: Sunday, August 6, 2017 at 10:40 PM To: "user@flink.apache.org" <user@flink.apache.org> Subject: [EXTERNAL] Re: Help required - "BucketingSink" usage to write HDFS Files Hi Raja, Have you enabled checkpointing? The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time. Regards, Vinay Patil On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <[hidden email]> wrote: Hi, I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs. Below is the code I am using. Can someone pls help me identify the issue and help me fix this ? BucketingSink HdfsSink = new BucketingSink("hdfs://///Test/"); HdfsSink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB, HdfsSink.setInactiveBucketCheckInterval(1L); HdfsSink.setInactiveBucketThreshold(1L); Thanks a lot. Regards, Raja. If you reply to this email, your message will be added to the discussion below: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714.html To start a new topic under Apache Flink User Mailing List archive., email [hidden email] To unsubscribe from Apache Flink User Mailing List archive., click here. NAML<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> ____ View this message in context: Re: Help required - "BucketingSink" usage to write HDFS Files<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html> Sent from the Apache Flink User Mailing List archive. mailing list archive<http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> at Nabble.com.
Re: Help required - "BucketingSink" usage to write HDFS Files
Hi Raja, Have you enabled checkpointing? The files will be rolled to complete state when the batch size is reached (in your case 2 MB) or when the bucket is inactive for a certain amount of time. Regards, Vinay Patil On Mon, Aug 7, 2017 at 7:53 AM, Raja.Aravapalli [via Apache Flink User Mailing List archive.] <ml+s2336050n14714...@n4.nabble.com> wrote: > > > Hi, > > > > I am working on a poc to write to hdfs files using BucketingSink class. > Even thought I am the data is being writing to hdfs files, but the files > are lying with “.pending” on hdfs. > > > > > > Below is the code I am using. Can someone pls help me identify the issue > and help me fix this ? > > > > > > BucketingSink HdfsSink = *new *BucketingSink( > *"hdfs://///Test/"*); > > > > *HdfsSink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); > HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB, > HdfsSink.setInactiveBucketCheckInterval(1L); > HdfsSink.setInactiveBucketThreshold(1L);* > > > > > > Thanks a lot. > > > > > > Regards, > > Raja. > > > -- > If you reply to this email, your message will be added to the discussion > below: > http://apache-flink-user-mailing-list-archive.2336050. > n4.nabble.com/Help-required-BucketingSink-usage-to-write- > HDFS-Files-tp14714.html > To start a new topic under Apache Flink User Mailing List archive., email > ml+s2336050n1...@n4.nabble.com > To unsubscribe from Apache Flink User Mailing List archive., click here > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=dmluYXkxOC5wYXRpbEBnbWFpbC5jb218MXwxODExMDE2NjAx> > . > NAML > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Help-required-BucketingSink-usage-to-write-HDFS-Files-tp14714p14715.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Help required - "BucketingSink" usage to write HDFS Files
Hi, I am working on a poc to write to hdfs files using BucketingSink class. Even thought I am the data is being writing to hdfs files, but the files are lying with “.pending” on hdfs. Below is the code I am using. Can someone pls help me identify the issue and help me fix this ? BucketingSink HdfsSink = new BucketingSink("hdfs://///Test/"); HdfsSink.setBucketer(new DateTimeBucketer("-MM-dd--HHmm")); HdfsSink.setBatchSize(1024 * 1024 * 2); // this is 2 MB, HdfsSink.setInactiveBucketCheckInterval(1L); HdfsSink.setInactiveBucketThreshold(1L); Thanks a lot. Regards, Raja.