Re: ElasticSearch with RestHighLevelClient
Thanks Tim, I believe I'm doing what Jean-Baptiste recommends, so I guess I'll have a look at the snapshot and see what's different. I don't mind waiting a bit if it means I don't have to duplicate working code. ry On Mon, Oct 23, 2017 at 3:15 PM, Tim Robertson wrote: > Hi Ryan, > > I can confirm 2.2.0-SNAPSHOT works fine with an ES 5.6 cluster. I am told > 2.2.0 is expected within a couple weeks. > My work is only a proof of concept for now, but I put in 300M fairly small > docs at around 100,000/sec on a 3 node cluster without any issue [1]. > > Hope this helps, > Tim > > > [1] > https://github.com/gbif/pipelines/blob/master/gbif/src/main/java/org/gbif/pipelines/indexing/Avro2ElasticSearchPipeline.java > > > On Mon, Oct 23, 2017 at 9:00 PM, Jean-Baptiste Onofré > wrote: >> >> Hi Ryan, >> >> the last version of ElasticsearchIO (that will be included in Beam 2.2.0) >> supports Elasticsearch 5.x. >> >> The client should be created in the @Setup (or @StartBundle) and release >> cleanly in @Teardown (or @FinishBundle). Then, it's used in @ProcessElement >> to actually store the elements in the PCollection. >> >> Regards >> JB >> >> >> On 10/23/2017 08:53 PM, Ryan Bobko wrote: >>> >>> Hi JB, >>> Thanks for your input. I'm trying to update ElasticsearchIO, and >>> hopefully learn a bit about Beam in the process. The documentation >>> says ElasticsearchIO only works with ES 2.X, and I'm using ES 5.6. I'd >>> prefer not to have two ES libs in my classpath if I can avoid it. I'm >>> just getting started, so my pipeline is quite simple: >>> >>> pipeline.apply( "Raw Reader", reader ) // read raw files >>> .apply( "Document Generator", ParDo.of( extractor ) ) // >>> create my document objects for ES insertion >>> .apply( "Elastic Writer", new ElasticWriter( ... ); // >>> upload to ES >>> >>> >>> public final class ElasticWriter extends >>> PTransform, PDone> { >>> >>>private static final Logger log = LoggerFactory.getLogger( >>> ElasticWriter.class ); >>>private final String elasticurl; >>> >>>public ElasticWriter( String url ) { >>> elasticurl = url; >>>} >>> >>>@Override >>>public PDone expand( PCollection input ) { >>> input.apply( ParDo.of( new WriteFn( elasticurl ) ) ); >>> return PDone.in( input.getPipeline() ); >>>} >>> >>>public static class WriteFn extends DoFn implements >>> Serializable { >>> >>> private transient RestHighLevelClient client; >>> private final String elasticurl; >>> >>> public WriteFn( String elasticurl ) { >>>this.elasticurl = elasticurl; >>> } >>> >>> @Setup >>> public void setup() { >>>log.debug( " into WriteFn::setup" ); >>>HttpHost elastic = HttpHost.create( elasticurl ); >>>RestClientBuilder bldr = RestClient.builder( elastic ); >>> >>>// if this is uncommented, the program never exits >>>//client = new RestHighLevelClient( bldr.build() ); >>> } >>> >>> @Teardown >>> public void teardown() { >>>log.debug( " into WriteFn::teardown" ); >>>// there's nothing to tear down >>> } >>> >>> @ProcessElement >>> public void pe( ProcessContext c ) { >>>Document doc = DocumentImpl.from( c.element() ); >>>log.debug( "writing {} to elastic", doc.getMetadata().first( >>> Metadata.NAME ) ); >>> >>>// this is where I want to write to ES, but for now, just write >>> a text file >>> >>>ObjectMapper mpr = new ObjectMapper(); >>> >>>try ( Writer fos = new BufferedWriter( new FileWriter( new File( >>> "/tmp/writers", >>>doc.getMetadata().first( Metadata.NAME ).asString() ) ) ) >>> ) { >>> mpr.writeValue( fos, doc ); >>>} >>>catch ( IOException ioe ) { >>> log.error( ioe.getLocalizedMessage(), ioe ); >>>} >>> } >>>} >>> } >>> >>> >>> On Mon, Oct 23, 2017 at 2:28 PM, Jean-Baptiste Onofré >>> wrote: Hi Ryan, Why don't you use the ElasticsearchIO for that ? Anyway, can you share your pipeline where you have the ParDo calling your DoFn ? Thanks, Regards JB On 10/23/2017 07:50 PM, r...@ostrich-emulators.com wrote: > > > Hi List, > I'm trying to write an updated ElasticSearch client using the > newly-published RestHighLevelClient class (with ES 5.6.0). I'm only > interested in writes at this time, so I'm using the > ElasticsearchIO.write() > function as a model. I have a transient member named client. Here's my > setup > function for my DoFn: > > @Setup > public void setup() { > HttpHost elastic = HttpHost.create( elasticurl ); > RestClientBuilder bldr = RestClient.builder( elastic ); > client = new RestHighLevelClient( bldr.build() ); > } > > If I run the code as shown, I eventually get the debug message:
Re: ElasticSearch with RestHighLevelClient
Hi Ryan, I can confirm 2.2.0-SNAPSHOT works fine with an ES 5.6 cluster. I am told 2.2.0 is expected within a couple weeks. My work is only a proof of concept for now, but I put in 300M fairly small docs at around 100,000/sec on a 3 node cluster without any issue [1]. Hope this helps, Tim [1] https://github.com/gbif/pipelines/blob/master/gbif/src/main/java/org/gbif/pipelines/indexing/Avro2ElasticSearchPipeline.java On Mon, Oct 23, 2017 at 9:00 PM, Jean-Baptiste Onofré wrote: > Hi Ryan, > > the last version of ElasticsearchIO (that will be included in Beam 2.2.0) > supports Elasticsearch 5.x. > > The client should be created in the @Setup (or @StartBundle) and release > cleanly in @Teardown (or @FinishBundle). Then, it's used in @ProcessElement > to actually store the elements in the PCollection. > > Regards > JB > > > On 10/23/2017 08:53 PM, Ryan Bobko wrote: > >> Hi JB, >> Thanks for your input. I'm trying to update ElasticsearchIO, and >> hopefully learn a bit about Beam in the process. The documentation >> says ElasticsearchIO only works with ES 2.X, and I'm using ES 5.6. I'd >> prefer not to have two ES libs in my classpath if I can avoid it. I'm >> just getting started, so my pipeline is quite simple: >> >> pipeline.apply( "Raw Reader", reader ) // read raw files >> .apply( "Document Generator", ParDo.of( extractor ) ) // >> create my document objects for ES insertion >> .apply( "Elastic Writer", new ElasticWriter( ... ); // >> upload to ES >> >> >> public final class ElasticWriter extends >> PTransform, PDone> { >> >>private static final Logger log = LoggerFactory.getLogger( >> ElasticWriter.class ); >>private final String elasticurl; >> >>public ElasticWriter( String url ) { >> elasticurl = url; >>} >> >>@Override >>public PDone expand( PCollection input ) { >> input.apply( ParDo.of( new WriteFn( elasticurl ) ) ); >> return PDone.in( input.getPipeline() ); >>} >> >>public static class WriteFn extends DoFn implements >> Serializable { >> >> private transient RestHighLevelClient client; >> private final String elasticurl; >> >> public WriteFn( String elasticurl ) { >>this.elasticurl = elasticurl; >> } >> >> @Setup >> public void setup() { >>log.debug( " into WriteFn::setup" ); >>HttpHost elastic = HttpHost.create( elasticurl ); >>RestClientBuilder bldr = RestClient.builder( elastic ); >> >>// if this is uncommented, the program never exits >>//client = new RestHighLevelClient( bldr.build() ); >> } >> >> @Teardown >> public void teardown() { >>log.debug( " into WriteFn::teardown" ); >>// there's nothing to tear down >> } >> >> @ProcessElement >> public void pe( ProcessContext c ) { >>Document doc = DocumentImpl.from( c.element() ); >>log.debug( "writing {} to elastic", doc.getMetadata().first( >> Metadata.NAME ) ); >> >>// this is where I want to write to ES, but for now, just write >> a text file >> >>ObjectMapper mpr = new ObjectMapper(); >> >>try ( Writer fos = new BufferedWriter( new FileWriter( new File( >> "/tmp/writers", >>doc.getMetadata().first( Metadata.NAME ).asString() ) ) ) >> ) { >> mpr.writeValue( fos, doc ); >>} >>catch ( IOException ioe ) { >> log.error( ioe.getLocalizedMessage(), ioe ); >>} >> } >>} >> } >> >> >> On Mon, Oct 23, 2017 at 2:28 PM, Jean-Baptiste Onofré >> wrote: >> >>> Hi Ryan, >>> >>> Why don't you use the ElasticsearchIO for that ? >>> >>> Anyway, can you share your pipeline where you have the ParDo calling your >>> DoFn ? >>> >>> Thanks, >>> Regards >>> JB >>> >>> >>> On 10/23/2017 07:50 PM, r...@ostrich-emulators.com wrote: >>> Hi List, I'm trying to write an updated ElasticSearch client using the newly-published RestHighLevelClient class (with ES 5.6.0). I'm only interested in writes at this time, so I'm using the ElasticsearchIO.write() function as a model. I have a transient member named client. Here's my setup function for my DoFn: @Setup public void setup() { HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); client = new RestHighLevelClient( bldr.build() ); } If I run the code as shown, I eventually get the debug message: "Pipeline has terminated. Shutting down." but the program never exits. If I comment out the client assignment above, the pipeline behaves normally (but obviously, I can't write anything to ES). Any advice for a dev just getting started with Apache Beam (2.0.0)? ry >>> -- >>> Jean-Baptiste Onofré >>> jbono...@apache.org >>> http://blog.nanthrax.net >>> Talend - http://www.talend.com >>> >>
Re: ElasticSearch with RestHighLevelClient
Hi Ryan, the last version of ElasticsearchIO (that will be included in Beam 2.2.0) supports Elasticsearch 5.x. The client should be created in the @Setup (or @StartBundle) and release cleanly in @Teardown (or @FinishBundle). Then, it's used in @ProcessElement to actually store the elements in the PCollection. Regards JB On 10/23/2017 08:53 PM, Ryan Bobko wrote: Hi JB, Thanks for your input. I'm trying to update ElasticsearchIO, and hopefully learn a bit about Beam in the process. The documentation says ElasticsearchIO only works with ES 2.X, and I'm using ES 5.6. I'd prefer not to have two ES libs in my classpath if I can avoid it. I'm just getting started, so my pipeline is quite simple: pipeline.apply( "Raw Reader", reader ) // read raw files .apply( "Document Generator", ParDo.of( extractor ) ) // create my document objects for ES insertion .apply( "Elastic Writer", new ElasticWriter( ... ); // upload to ES public final class ElasticWriter extends PTransform, PDone> { private static final Logger log = LoggerFactory.getLogger( ElasticWriter.class ); private final String elasticurl; public ElasticWriter( String url ) { elasticurl = url; } @Override public PDone expand( PCollection input ) { input.apply( ParDo.of( new WriteFn( elasticurl ) ) ); return PDone.in( input.getPipeline() ); } public static class WriteFn extends DoFn implements Serializable { private transient RestHighLevelClient client; private final String elasticurl; public WriteFn( String elasticurl ) { this.elasticurl = elasticurl; } @Setup public void setup() { log.debug( " into WriteFn::setup" ); HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); // if this is uncommented, the program never exits //client = new RestHighLevelClient( bldr.build() ); } @Teardown public void teardown() { log.debug( " into WriteFn::teardown" ); // there's nothing to tear down } @ProcessElement public void pe( ProcessContext c ) { Document doc = DocumentImpl.from( c.element() ); log.debug( "writing {} to elastic", doc.getMetadata().first( Metadata.NAME ) ); // this is where I want to write to ES, but for now, just write a text file ObjectMapper mpr = new ObjectMapper(); try ( Writer fos = new BufferedWriter( new FileWriter( new File( "/tmp/writers", doc.getMetadata().first( Metadata.NAME ).asString() ) ) ) ) { mpr.writeValue( fos, doc ); } catch ( IOException ioe ) { log.error( ioe.getLocalizedMessage(), ioe ); } } } } On Mon, Oct 23, 2017 at 2:28 PM, Jean-Baptiste Onofré wrote: Hi Ryan, Why don't you use the ElasticsearchIO for that ? Anyway, can you share your pipeline where you have the ParDo calling your DoFn ? Thanks, Regards JB On 10/23/2017 07:50 PM, r...@ostrich-emulators.com wrote: Hi List, I'm trying to write an updated ElasticSearch client using the newly-published RestHighLevelClient class (with ES 5.6.0). I'm only interested in writes at this time, so I'm using the ElasticsearchIO.write() function as a model. I have a transient member named client. Here's my setup function for my DoFn: @Setup public void setup() { HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); client = new RestHighLevelClient( bldr.build() ); } If I run the code as shown, I eventually get the debug message: "Pipeline has terminated. Shutting down." but the program never exits. If I comment out the client assignment above, the pipeline behaves normally (but obviously, I can't write anything to ES). Any advice for a dev just getting started with Apache Beam (2.0.0)? ry -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
Re: ElasticSearch with RestHighLevelClient
Hi JB, Thanks for your input. I'm trying to update ElasticsearchIO, and hopefully learn a bit about Beam in the process. The documentation says ElasticsearchIO only works with ES 2.X, and I'm using ES 5.6. I'd prefer not to have two ES libs in my classpath if I can avoid it. I'm just getting started, so my pipeline is quite simple: pipeline.apply( "Raw Reader", reader ) // read raw files .apply( "Document Generator", ParDo.of( extractor ) ) // create my document objects for ES insertion .apply( "Elastic Writer", new ElasticWriter( ... ); // upload to ES public final class ElasticWriter extends PTransform, PDone> { private static final Logger log = LoggerFactory.getLogger( ElasticWriter.class ); private final String elasticurl; public ElasticWriter( String url ) { elasticurl = url; } @Override public PDone expand( PCollection input ) { input.apply( ParDo.of( new WriteFn( elasticurl ) ) ); return PDone.in( input.getPipeline() ); } public static class WriteFn extends DoFn implements Serializable { private transient RestHighLevelClient client; private final String elasticurl; public WriteFn( String elasticurl ) { this.elasticurl = elasticurl; } @Setup public void setup() { log.debug( " into WriteFn::setup" ); HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); // if this is uncommented, the program never exits //client = new RestHighLevelClient( bldr.build() ); } @Teardown public void teardown() { log.debug( " into WriteFn::teardown" ); // there's nothing to tear down } @ProcessElement public void pe( ProcessContext c ) { Document doc = DocumentImpl.from( c.element() ); log.debug( "writing {} to elastic", doc.getMetadata().first( Metadata.NAME ) ); // this is where I want to write to ES, but for now, just write a text file ObjectMapper mpr = new ObjectMapper(); try ( Writer fos = new BufferedWriter( new FileWriter( new File( "/tmp/writers", doc.getMetadata().first( Metadata.NAME ).asString() ) ) ) ) { mpr.writeValue( fos, doc ); } catch ( IOException ioe ) { log.error( ioe.getLocalizedMessage(), ioe ); } } } } On Mon, Oct 23, 2017 at 2:28 PM, Jean-Baptiste Onofré wrote: > Hi Ryan, > > Why don't you use the ElasticsearchIO for that ? > > Anyway, can you share your pipeline where you have the ParDo calling your > DoFn ? > > Thanks, > Regards > JB > > > On 10/23/2017 07:50 PM, r...@ostrich-emulators.com wrote: >> >> Hi List, >> I'm trying to write an updated ElasticSearch client using the >> newly-published RestHighLevelClient class (with ES 5.6.0). I'm only >> interested in writes at this time, so I'm using the ElasticsearchIO.write() >> function as a model. I have a transient member named client. Here's my setup >> function for my DoFn: >> >> @Setup >> public void setup() { >>HttpHost elastic = HttpHost.create( elasticurl ); >>RestClientBuilder bldr = RestClient.builder( elastic ); >>client = new RestHighLevelClient( bldr.build() ); >> } >> >> If I run the code as shown, I eventually get the debug message: "Pipeline >> has terminated. Shutting down." but the program never exits. If I comment >> out the client assignment above, the pipeline behaves normally (but >> obviously, I can't write anything to ES). >> >> Any advice for a dev just getting started with Apache Beam (2.0.0)? >> >> ry >> > > -- > Jean-Baptiste Onofré > jbono...@apache.org > http://blog.nanthrax.net > Talend - http://www.talend.com
Re: ElasticSearch with RestHighLevelClient
Hi Ryan, Why don't you use the ElasticsearchIO for that ? Anyway, can you share your pipeline where you have the ParDo calling your DoFn ? Thanks, Regards JB On 10/23/2017 07:50 PM, r...@ostrich-emulators.com wrote: Hi List, I'm trying to write an updated ElasticSearch client using the newly-published RestHighLevelClient class (with ES 5.6.0). I'm only interested in writes at this time, so I'm using the ElasticsearchIO.write() function as a model. I have a transient member named client. Here's my setup function for my DoFn: @Setup public void setup() { HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); client = new RestHighLevelClient( bldr.build() ); } If I run the code as shown, I eventually get the debug message: "Pipeline has terminated. Shutting down." but the program never exits. If I comment out the client assignment above, the pipeline behaves normally (but obviously, I can't write anything to ES). Any advice for a dev just getting started with Apache Beam (2.0.0)? ry -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com
ElasticSearch with RestHighLevelClient
Hi List, I'm trying to write an updated ElasticSearch client using the newly-published RestHighLevelClient class (with ES 5.6.0). I'm only interested in writes at this time, so I'm using the ElasticsearchIO.write() function as a model. I have a transient member named client. Here's my setup function for my DoFn: @Setup public void setup() { HttpHost elastic = HttpHost.create( elasticurl ); RestClientBuilder bldr = RestClient.builder( elastic ); client = new RestHighLevelClient( bldr.build() ); } If I run the code as shown, I eventually get the debug message: "Pipeline has terminated. Shutting down." but the program never exits. If I comment out the client assignment above, the pipeline behaves normally (but obviously, I can't write anything to ES). Any advice for a dev just getting started with Apache Beam (2.0.0)? ry
RE: How to use ConsoleIO SDK
Hi, If it's all right with you, I wanna try it. Thanks Sincerely Rick -Original Message- From: Jean-Baptiste Onofré [mailto:j...@nanthrax.net] Sent: Friday, October 20, 2017 1:32 PM To: user@beam.apache.org Subject: Re: How to use ConsoleIO SDK Hi, I started to work on the ConsoleIO (and SocketIO too), but it's not yet merged. I can provide a SNAPSHOT to you if you wanna try. Regards JB On 10/20/2017 04:14 AM, linr...@itri.org.tw wrote: > Dear sir, > > I have the question how to use the beam java sdk: ConsoleIO. > > My objective colored in background yellow is to write the PCollection > ”data” on Console, and then use it(type: RDD ??) as another variable to do > other works. > > If any further information is needed, I am glad to be informed and > will provide to you as soon as possible. > > I am looking forward to hearing from you. > > My java code is as: > > “ > > *import *java.io.IOException; > > *import*org.apache.beam.sdk.Pipeline; > > *import*org.apache.beam.sdk.options.PipelineOptionsFactory; > > *import*org.apache.beam.runners.spark.SparkRunner; > > *import*org.apache.beam.runners.spark.io.ConsoleIO; > > *import*org.apache.beam.runners.spark.SparkPipelineOptions; > > ** > > *import *org.apache.beam.sdk.transforms.Create; > > *import *org.apache.beam.sdk.values.KV; > > *import *org.apache.beam.sdk.values.PCollection; > > *import *org.apache.beam.sdk.values.TimestampedValue; > > ** > > *import *javafx.util.Pair;** > > ** > > *import*org.joda.time.Duration; > > *import*org.joda.time.Instant; > > *import*org.joda.time.MutableDateTime; > > *public**static**void*main(String[] args) *throws*IOException { > > MutableDateTime mutableNow= > Instant./now/().toMutableDateTime(); > > mutableNow.setDateTime(2017, 7, 12, 14, 0, 0, 0); > > Instant starttime= mutableNow.toInstant().plus(8*60*60*1000); > > *int*min; > > *int*sec; > > *int*millsec; > > min=2; > > sec=min*60; > > millsec=sec*1000; > > *double*[] value=*new**double*[] {1.0,2.0,3.0,4.0,5.0}; > > List>>> > dataList= *new*ArrayList<>(); > > *int*n=value.length; > > *int*count=0; > > *for*(*int*i=0; i > { > > count=count+1; > > *if*(i<=3) > > { > >Instant M1_time=starttime.plus(millsec*count); > > dataList.add(TimestampedValue./of/(KV./of/("M1", *new*Pair Double> (i,value[i])), M1_time)); > > } > > *else**if*(4<=i&& i<5) > > { > >Instant M2_time=starttime.plus(millsec*count); > > dataList.add(TimestampedValue./of/(KV./of/("M1", *new*Pair Double> (i,value[i])), M2_time)); > > } > > *else* > > { > >Instant M3_time=starttime.plus(millsec*count); > > dataList.add(TimestampedValue./of/(KV./of/("M1", *new*Pair Double> (i,value[i])), M3_time)); > > } > > System.*/out/*.println("raw_data="+dataList.get(i)); > > } > > SparkPipelineOptions options= > PipelineOptionsFactory./as/(SparkPipelineOptions.*class*); > > options.setRunner(SparkRunner.*class*); > > options.setSparkMaster("local[4]"); > > Pipeline p= Pipeline./create/(options); > > PCollection>> data=p.apply("create > data with time",Create./timestamped/(dataList)); > > data.apply("spark_write_on_console",ConsoleIO.Write._out_); > > p.run().waitUntilFinish(); > > ” > > Thanks very much > > Sincerely yours, > > Liang-Sian Lin, Dr. > > Oct 20 2017 > > > > -- > 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀 > 此信件。 This email may contain confidential information. Please do not > use or disclose it in any way and delete it if you are not the intended > recipient. -- Jean-Baptiste Onofré jbono...@apache.org http://blog.nanthrax.net Talend - http://www.talend.com -- 本信件可能包含工研院機密資訊,非指定之收件者,請勿使用或揭露本信件內容,並請銷毀此信件。 This email may contain confidential information. Please do not use or disclose it in any way and delete it if you are not the intended recipient.