New blog post: Splittable DoFn

2017-08-16 Thread Eugene Kirpichov
Hi all,

The blog post Powerful and modular IO connectors with Splittable DoFn in
Apache Beam 
just
went live - take a look!

*One of the most important parts of the Apache Beam ecosystem is its
quickly growing set of connectors that allow Beam pipelines to read and
write data to various data storage systems (“IOs”). Currently, Beam ships
over 20 IO connectors with many more in active development. As user demands
for IO connectors grew, our work on improving the related Beam APIs (in
particular, the Source API) produced an unexpected result: a generalization
of Beam’s most basic primitive, DoFn.*

Thanks to all the reviewers of the PR
 for edit suggestions!


Re: Resampling a timeserie stuck on a GroupByKey

2017-08-16 Thread Lukasz Cwik
Do you have some job ids that you could share?

On Wed, Aug 16, 2017 at 1:18 PM, Tristan Marechaux <
tristan.marech...@walnut-algo.com> wrote:

> Thanks for the invitation and for the answer.
>
> I tried the Count resample function and I still have the same issue, so I
> guess it doesn't come from my resample function, but here is the code in
> case :
>
> def resample_function(candles):
>
> sorted_candles = sorted(filter(lambda x: x.date is not None, candles), 
> key=lambda candle: candle.date)
> if len(sorted_candles) > 0:
> return Candle(
> sorted_candles[-1].date,
> sorted_candles[0].open,
> max(candle.high for candle in candles),
> min(candle.low for candle in candles),
> sorted_candles[-1].close,
> sum((candle.volume for candle in candles), .0)
> )
>
>
> The fact is that the pipeline seems stucked on the GroupByKey inside the
> CombineGlobaly PTransform before the call of my resample_function (if the
> GCP web interface is accurate).
>
> I tried the with to have in my pipeline only native python type with the
> CountCombineFn and it's still stucked.
>
> Here is what I can see on my GCP console (this screenshot shows 36 minutes
> by I waited for 5 hours to be sure) :
> [image: Selection_070.png]
>
>
> On Wed, Aug 16, 2017 at 1:08 AM Lukasz Cwik  wrote:
>
>> I have invited you to the slack channel.
>>
>> 2 million data points doesn't seem like it should be an issue.
>> Have you considered trying a simpler combiner like Count to see if the
>> bottleneck is with the combiner that you are supplying?
>> Also, could you share the code for what resample_function does?
>>
>> On Mon, Aug 14, 2017 at 2:43 AM, Tristan Marechaux <
>> tristan.marech...@walnut-algo.com> wrote:
>>
>>> Hi all,
>>>
>>> I wrote a Beam Pipeline written with the python SDK that resample a
>>> timeseries containing data points everery minute to a 5-minutes timeserie.
>>>
>>> My pipeline looks like:
>>> input_data | 
>>> WindowInto(FixedWindows(size=timedelta(minutes=5).total_seconds()))
>>> | CombineGlobaly(resample_function)
>>>
>>> When I run it with the local or DataFlow runner with a small dataset, it
>>> works and does what I want.
>>>
>>> But when I try to run it on the DataFlow runner with a bigger dataset (1
>>> 700 000 datapoints timestamped over 15 years) it stay stuck for hours on
>>> the GroupByKey step of CombineGlobaly.
>>>
>>> My question is : Did I do something wrong with the design of my pipeline?
>>>
>>> PS: Can someone invite me to the slack channel?
>>> --
>>>
>>> Tristan Marechaux
>>>
>>> Data Scientist | *Walnut Algorithms*
>>>
>>> Mobile : +33 627804399 <+33627804399>
>>>
>>> Email: tristan.marech...@walnut-algo.com
>>>
>>> Web: www.walnutalgorithms.com
>>>
>>
>> --
>
> Tristan Marechaux
>
> Data Scientist | *Walnut Algorithms*
>
> Mobile : +33 627804399 <+33627804399>
>
> Email: tristan.marech...@walnut-algo.com
>
> Web: www.walnutalgorithms.com
>


Re: Resampling a timeserie stuck on a GroupByKey

2017-08-16 Thread Tristan Marechaux
Thanks for the invitation and for the answer.

I tried the Count resample function and I still have the same issue, so I
guess it doesn't come from my resample function, but here is the code in
case :

def resample_function(candles):

sorted_candles = sorted(filter(lambda x: x.date is not None,
candles), key=lambda candle: candle.date)
if len(sorted_candles) > 0:
return Candle(
sorted_candles[-1].date,
sorted_candles[0].open,
max(candle.high for candle in candles),
min(candle.low for candle in candles),
sorted_candles[-1].close,
sum((candle.volume for candle in candles), .0)
)


The fact is that the pipeline seems stucked on the GroupByKey inside the
CombineGlobaly PTransform before the call of my resample_function (if the
GCP web interface is accurate).

I tried the with to have in my pipeline only native python type with the
CountCombineFn and it's still stucked.

Here is what I can see on my GCP console (this screenshot shows 36 minutes
by I waited for 5 hours to be sure) :
[image: Selection_070.png]


On Wed, Aug 16, 2017 at 1:08 AM Lukasz Cwik  wrote:

> I have invited you to the slack channel.
>
> 2 million data points doesn't seem like it should be an issue.
> Have you considered trying a simpler combiner like Count to see if the
> bottleneck is with the combiner that you are supplying?
> Also, could you share the code for what resample_function does?
>
> On Mon, Aug 14, 2017 at 2:43 AM, Tristan Marechaux <
> tristan.marech...@walnut-algo.com> wrote:
>
>> Hi all,
>>
>> I wrote a Beam Pipeline written with the python SDK that resample a
>> timeseries containing data points everery minute to a 5-minutes timeserie.
>>
>> My pipeline looks like:
>> input_data |
>> WindowInto(FixedWindows(size=timedelta(minutes=5).total_seconds())) |
>> CombineGlobaly(resample_function)
>>
>> When I run it with the local or DataFlow runner with a small dataset, it
>> works and does what I want.
>>
>> But when I try to run it on the DataFlow runner with a bigger dataset (1
>> 700 000 datapoints timestamped over 15 years) it stay stuck for hours on
>> the GroupByKey step of CombineGlobaly.
>>
>> My question is : Did I do something wrong with the design of my pipeline?
>>
>> PS: Can someone invite me to the slack channel?
>> --
>>
>> Tristan Marechaux
>>
>> Data Scientist | *Walnut Algorithms*
>>
>> Mobile : +33 627804399 <+33627804399>
>>
>> Email: tristan.marech...@walnut-algo.com
>>
>> Web: www.walnutalgorithms.com
>>
>
> --

Tristan Marechaux

Data Scientist | *Walnut Algorithms*

Mobile : +33 627804399 <+33627804399>

Email: tristan.marech...@walnut-algo.com

Web: www.walnutalgorithms.com


Re: Slack channel

2017-08-16 Thread Steve Niemitz
Ah interesting, I guess no one told the Mesos guys that! :D

Thanks for the invite though!

On Wed, Aug 16, 2017 at 1:25 PM, Lukasz Cwik  wrote:

> Welcome Griselda, Steve, and Apache.
>
> Steve, this has come up before but it is against Slack's free tier policy
> for having a bot which sends invites out automatically.
>
> On Wed, Aug 16, 2017 at 10:18 AM, Apache Enthu 
> wrote:
>
>> Please could you add me too?
>>
>> Thanks,
>> Almas
>>
>> On 16 Aug 2017 22:41, "Steve Niemitz"  wrote:
>>
>>> I'll jump on this thread as well, can I get an invite too?
>>>
>>> Also, has anyone though of making this self service?  The apache mesos
>>> slack has this set up [1].
>>>
>>> [1] https://mesos-slackin.herokuapp.com
>>>
>>> On Aug 16, 2017 1:08 PM, "Griselda Cuevas"  wrote:
>>>
 Hi Manu, I'd like to piggy back on shen's request, could you add me to
 the channel as well?

 On 15 August 2017 at 21:32, Manu Zhang  wrote:

> Invitation sent. Welcome.
>
> On Wed, Aug 16, 2017 at 11:41 AM shen yu  wrote:
>
>> Hi, I'd like to join the Slack channel for Apache Beam. I work at
>> Klook and would like to get involved in the Apache Beam community.
>> My email is you...@klook.com
>>
>

>


Re: Slack channel

2017-08-16 Thread Lukasz Cwik
Welcome Griselda, Steve, and Apache.

Steve, this has come up before but it is against Slack's free tier policy
for having a bot which sends invites out automatically.

On Wed, Aug 16, 2017 at 10:18 AM, Apache Enthu 
wrote:

> Please could you add me too?
>
> Thanks,
> Almas
>
> On 16 Aug 2017 22:41, "Steve Niemitz"  wrote:
>
>> I'll jump on this thread as well, can I get an invite too?
>>
>> Also, has anyone though of making this self service?  The apache mesos
>> slack has this set up [1].
>>
>> [1] https://mesos-slackin.herokuapp.com
>>
>> On Aug 16, 2017 1:08 PM, "Griselda Cuevas"  wrote:
>>
>>> Hi Manu, I'd like to piggy back on shen's request, could you add me to
>>> the channel as well?
>>>
>>> On 15 August 2017 at 21:32, Manu Zhang  wrote:
>>>
 Invitation sent. Welcome.

 On Wed, Aug 16, 2017 at 11:41 AM shen yu  wrote:

> Hi, I'd like to join the Slack channel for Apache Beam. I work at
> Klook and would like to get involved in the Apache Beam community. My
> email is you...@klook.com
>

>>>


Re: Slack channel

2017-08-16 Thread Apache Enthu
Please could you add me too?

Thanks,
Almas

On 16 Aug 2017 22:41, "Steve Niemitz"  wrote:

> I'll jump on this thread as well, can I get an invite too?
>
> Also, has anyone though of making this self service?  The apache mesos
> slack has this set up [1].
>
> [1] https://mesos-slackin.herokuapp.com
>
> On Aug 16, 2017 1:08 PM, "Griselda Cuevas"  wrote:
>
>> Hi Manu, I'd like to piggy back on shen's request, could you add me to
>> the channel as well?
>>
>> On 15 August 2017 at 21:32, Manu Zhang  wrote:
>>
>>> Invitation sent. Welcome.
>>>
>>> On Wed, Aug 16, 2017 at 11:41 AM shen yu  wrote:
>>>
 Hi, I'd like to join the Slack channel for Apache Beam. I work at
 Klook and would like to get involved in the Apache Beam community. My
 email is you...@klook.com

>>>
>>


Re: Slack channel

2017-08-16 Thread Steve Niemitz
I'll jump on this thread as well, can I get an invite too?

Also, has anyone though of making this self service?  The apache mesos
slack has this set up [1].

[1] https://mesos-slackin.herokuapp.com

On Aug 16, 2017 1:08 PM, "Griselda Cuevas"  wrote:

> Hi Manu, I'd like to piggy back on shen's request, could you add me to the
> channel as well?
>
> On 15 August 2017 at 21:32, Manu Zhang  wrote:
>
>> Invitation sent. Welcome.
>>
>> On Wed, Aug 16, 2017 at 11:41 AM shen yu  wrote:
>>
>>> Hi, I'd like to join the Slack channel for Apache Beam. I work at Klook
>>> and would like to get involved in the Apache Beam community. My email
>>> is you...@klook.com
>>>
>>
>


Re: Using external python packges in pipeline

2017-08-16 Thread Chamikara Jayalath
On Tue, Aug 15, 2017 at 11:31 PM Chamikara Jayalath 
wrote:

> Please see following regarding managing dependencies for Python SDK.
> https://cloud.google.com/dataflow/pipelines/dependencies-python
>

Actually, we have an updated Apache Beam version of that document.
https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/

Thanks,
Cham

>
> Stack Overflow is a better resource for Cloud Dataflow specific questions.
> https://stackoverflow.com/questions/tagged/google-cloud-dataflow
>
> Thanks,
> Cham
>
>
> On Tue, Aug 15, 2017 at 8:53 PM shen yu  wrote:
>
>> Hi, I'm using apache-beam python sdk for running pipelines. How do I
>> install 3rd party packages and use them in my pipeline? I always get
>> ImportError: No module named ... Do I have to create a template and specify
>> staging_location, temp_location... in order to use external packages?
>> (right now I'm using Google cloud shell to run pipelines and install
>> packages by running pip install)
>>
>> p.s. I'm not sure if this is the right place to ask platform-specific
>> (Google cloud Dataflow) questions?
>>
>> Thanks in advance
>>
>> youxun
>>
>


Re: Slack channel

2017-08-16 Thread Griselda Cuevas
Hi Manu, I'd like to piggy back on shen's request, could you add me to the
channel as well?

On 15 August 2017 at 21:32, Manu Zhang  wrote:

> Invitation sent. Welcome.
>
> On Wed, Aug 16, 2017 at 11:41 AM shen yu  wrote:
>
>> Hi, I'd like to join the Slack channel for Apache Beam. I work at Klook
>> and would like to get involved in the Apache Beam community. My email is
>> you...@klook.com
>>
>


ConcurrentModificationException while performing checkpoint for Kinesis stream

2017-08-16 Thread Pawel Bartoszek
When flink performs a checkpoint I get
randomly ConcurrentModificationException.

>From my investigation it looks like the method

public boolean advance() throws IOException


from

https://github.com/apache/beam/blob/release-2.0.0/sdks/java/io/kinesis/src/main/java/org/apache/beam/sdk/io/kinesis/KinesisReader.java


is called in another thread while checkpoint is being performed.

The exception is caused because the method

public UnboundedSource.CheckpointMark getCheckpointMark()

from the KinesisReader.java

is iterating over iterator returned by
RoundRobin.iterator()  while a

public boolean advance() throws IOException is calling
RoundRobin.moveForward()
from another thread which is causing java.util.ConcurrentModificationException
to be thrown.


RoundRobin class is using java.util.Deque queue which
doesn't allow adding/removal of element while it's being iterated.

Is some locking missing?

I am using Beam 2.0.0, Flink 1.2.1, 20 slots and 32 kinesis shards.

I created a bug for it as well
https://issues.apache.org/jira/browse/BEAM-2752

Stacktrace:

java.lang.Exception: Error while triggering checkpoint 59 for Source:
Read(KinesisSource) -> Flat Map -> ParMultiDo(KinesisExtractor) ->
Flat Map -> ParMultiDo(StringToRecord) -> Flat Map ->
ParMultiDo(Anonymous) -> Flat Map -> ParMultiDo(ToRRecord) -> Flat Map
-> ParMultiDo(AddTimestamps) -> Flat Map ->
..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE MINUTE
WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem,
ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) ->
Flat Map -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1136)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not perform checkpoint 59 for
operator Source: Read(KinesisSource) -> Flat Map ->
ParMultiDo(KinesisExtractor) -> Flat Map -> ParMultiDo(StringToRecord)
-> Flat Map -> ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) -> Flat
Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE
MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map
-> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem,
ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) ->
Flat Map -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:524)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1125)
... 5 more
Caused by: java.lang.Exception: Could not complete snapshot 59 for
operator Source: Read(KinesisSource) -> Flat Map ->
ParMultiDo(KinesisExtractor) -> Flat Map -> ParMultiDo(StringToRecord)
-> Flat Map -> ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ToRRecord) -> Flat Map -> ParMultiDo(AddTimestamps) -> Flat
Map -> ..GroupByOneMinuteWindow GROUP RDOTRECORDS BY ONE
MINUTE WINDOWS/Window.Assign.out -> (ParMultiDo(Anonymous) -> Flat Map
-> ParMultiDo(ToSomeKey) -> Flat Map -> ToKeyedWorkItem,
ParMultiDo(ToCompositeKey) -> Flat Map -> ParMultiDo(Anonymous) ->
Flat Map -> ToKeyedWorkItem, ParMultiDo(Anonymous) -> Flat Map ->
ParMultiDo(ApplyShardingKey) -> Flat Map -> ToKeyedWorkItem) (1/20).
at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.snapshotState(AbstractStreamOperator.java:379)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.checkpointStreamOperator(StreamTask.java:1157)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:1090)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:630)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:575)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:518)
... 6 more
Caused by: java.util.ConcurrentModificationException
at java.util.ArrayDeque$DeqIterator.next(ArrayDeque.java:643)
at 
org.apache.beam.sdks.java.io.kinesis.repackaged.com.google.common.collect.TransformedIterator.next(TransformedIterator.java:47)
at 
org.apache.beam.sdks.java.io.kines

Re: Failed to run Wordcount example

2017-08-16 Thread Manu Zhang
I have seen the same error. There is a word-count-beam subproject under the
root from the first command. The project has a different version (0.1) than
its parent (2.0.0). It makes all beam dependencies to version 0.1. Once I
change the version to 2.0.0, the second command can run successfully.

I guess it's problematic to have a different child version than its parent
in maven, no?

Thanks,
Manu

On Wed, Aug 16, 2017 at 6:07 PM Vincent Wang  wrote:

> Oh, that's weird.
> I just used a new environment and reproduced this problem. I checked the
> network and all the artifacts are correctly fetched except
> the beam-sdks-java-extensions-google-cloud-platform-core.
>
> I noticed maven is trying to resolve
> beam-sdks-java-extensions-google-cloud-platform-core-0.1.jar but
>
> https://repo.maven.apache.org/maven2/org/apache/beam/beam-sdks-java-extensions-google-cloud-platform-core/0.1/beam-sdks-java-extensions-google-cloud-platform-core-0.1.jar
>  does
> not exist.
>
> I'll have a deeper dig into it.
>
> Thanks,
> Huafeng
>
> Ismaël Mejía 于2017年8月16日周三 下午4:29写道:
>
>> I just executed the same commands that you pasted in your email and it
>> worked for me, can you verify that you are not having network issues
>> while downloading the dependencies with maven ?
>>
>> On Wed, Aug 16, 2017 at 10:06 AM, Vincent Wang 
>> wrote:
>> > Hi Ismaël,
>> >
>> >   I'm running the command
>> >
>> >  mvn archetype:generate \
>> >   -DarchetypeGroupId=org.apache.beam \
>> >   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>> >   -DarchetypeVersion=2.0.0 \
>> >   -DgroupId=org.example \
>> >   -DartifactId=word-count-beam \
>> >   -Dversion="0.1" \
>> >   -Dpackage=org.apache.beam.examples \
>> >   -DinteractiveMode=false
>> >
>> > and
>> >
>> > mvn compile exec:java
>> -Dexec.mainClass=org.apache.beam.examples.WordCount \
>> >  -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
>> >
>> > just the way on the quick start page. It seems that the example somehow
>> > takes its own version as some beam dependency's version accidentally.
>> >
>> > BTW, I'm using the latest master branch.
>> >
>> > Thanks,
>> > Huafeng
>> >
>> >
>> > Ismaël Mejía 于2017年8月16日周三 下午3:57写道:
>> >>
>> >> Hello,
>> >>
>> >> The error message shows that it is looking for the Beam 0.1 version
>> >> and that version does not exist in maven central.
>> >> You have to replace the version of Beam in the command you executed
>> >> with the latest version that means 2.0.0 at this moment and it should
>> >> work.
>> >>
>> >> Regards,
>> >> Ismaël
>> >>
>> >>
>> >> On Wed, Aug 16, 2017 at 8:21 AM, Vincent Wang 
>> wrote:
>> >> > Hi guys,
>> >> >
>> >> >   I'm trying to run the wordcount example according to the quick
>> start
>> >> > but I
>> >> > got following error:
>> >> >
>> >> > [INFO]
>> >> >
>> 
>> >> > [INFO] Building word-count-beam 0.1
>> >> > [INFO]
>> >> >
>> 
>> >> > [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
>> >> > missing, no dependency information available
>> >> > [WARNING] Failed to retrieve plugin descriptor for
>> >> > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
>> >> > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
>> could
>> >> > not
>> >> > be resolved: Failure to find
>> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
>> >> > in
>> >> > https://repo.maven.apache.org/maven2 was cached in the local
>> repository,
>> >> > resolution will not be reattempted until the update interval of
>> central
>> >> > has
>> >> > elapsed or updates are forced
>> >> > [WARNING] The POM for org.apache.beam:beam-sdks-java-core:jar:0.1 is
>> >> > missing, no dependency information available
>> >> > [WARNING] The POM for
>> >> >
>> >> >
>> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
>> >> > is missing, no dependency information available
>> >> > [WARNING] The POM for
>> >> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1 is
>> missing,
>> >> > no
>> >> > dependency information available
>> >> > [INFO]
>> >> >
>> 
>> >> > [INFO] BUILD FAILURE
>> >> > [INFO]
>> >> >
>> 
>> >> > [INFO] Total time: 0.960 s
>> >> > [INFO] Finished at: 2017-08-16T14:14:42+08:00
>> >> > [INFO] Final Memory: 18M/309M
>> >> > [INFO]
>> >> >
>> 
>> >> > [ERROR] Failed to execute goal on project word-count-beam: Could not
>> >> > resolve
>> >> > dependencies for project org.example:word-count-beam:jar:0.1: The
>> >> > following
>> >> > artifacts could not be resolved:
>> >> >
>> >> >
>> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1,
>> >> > org.apache.bea

Re: Failed to run Wordcount example

2017-08-16 Thread Vincent Wang
Oh, that's weird.
I just used a new environment and reproduced this problem. I checked the
network and all the artifacts are correctly fetched except
the beam-sdks-java-extensions-google-cloud-platform-core.

I noticed maven is trying to resolve
beam-sdks-java-extensions-google-cloud-platform-core-0.1.jar but
https://repo.maven.apache.org/maven2/org/apache/beam/beam-sdks-java-extensions-google-cloud-platform-core/0.1/beam-sdks-java-extensions-google-cloud-platform-core-0.1.jar
does
not exist.

I'll have a deeper dig into it.

Thanks,
Huafeng

Ismaël Mejía 于2017年8月16日周三 下午4:29写道:

> I just executed the same commands that you pasted in your email and it
> worked for me, can you verify that you are not having network issues
> while downloading the dependencies with maven ?
>
> On Wed, Aug 16, 2017 at 10:06 AM, Vincent Wang 
> wrote:
> > Hi Ismaël,
> >
> >   I'm running the command
> >
> >  mvn archetype:generate \
> >   -DarchetypeGroupId=org.apache.beam \
> >   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
> >   -DarchetypeVersion=2.0.0 \
> >   -DgroupId=org.example \
> >   -DartifactId=word-count-beam \
> >   -Dversion="0.1" \
> >   -Dpackage=org.apache.beam.examples \
> >   -DinteractiveMode=false
> >
> > and
> >
> > mvn compile exec:java
> -Dexec.mainClass=org.apache.beam.examples.WordCount \
> >  -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
> >
> > just the way on the quick start page. It seems that the example somehow
> > takes its own version as some beam dependency's version accidentally.
> >
> > BTW, I'm using the latest master branch.
> >
> > Thanks,
> > Huafeng
> >
> >
> > Ismaël Mejía 于2017年8月16日周三 下午3:57写道:
> >>
> >> Hello,
> >>
> >> The error message shows that it is looking for the Beam 0.1 version
> >> and that version does not exist in maven central.
> >> You have to replace the version of Beam in the command you executed
> >> with the latest version that means 2.0.0 at this moment and it should
> >> work.
> >>
> >> Regards,
> >> Ismaël
> >>
> >>
> >> On Wed, Aug 16, 2017 at 8:21 AM, Vincent Wang 
> wrote:
> >> > Hi guys,
> >> >
> >> >   I'm trying to run the wordcount example according to the quick start
> >> > but I
> >> > got following error:
> >> >
> >> > [INFO]
> >> >
> 
> >> > [INFO] Building word-count-beam 0.1
> >> > [INFO]
> >> >
> 
> >> > [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> >> > missing, no dependency information available
> >> > [WARNING] Failed to retrieve plugin descriptor for
> >> > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> >> > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies
> could
> >> > not
> >> > be resolved: Failure to find
> org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
> >> > in
> >> > https://repo.maven.apache.org/maven2 was cached in the local
> repository,
> >> > resolution will not be reattempted until the update interval of
> central
> >> > has
> >> > elapsed or updates are forced
> >> > [WARNING] The POM for org.apache.beam:beam-sdks-java-core:jar:0.1 is
> >> > missing, no dependency information available
> >> > [WARNING] The POM for
> >> >
> >> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> >> > is missing, no dependency information available
> >> > [WARNING] The POM for
> >> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1 is missing,
> >> > no
> >> > dependency information available
> >> > [INFO]
> >> >
> 
> >> > [INFO] BUILD FAILURE
> >> > [INFO]
> >> >
> 
> >> > [INFO] Total time: 0.960 s
> >> > [INFO] Finished at: 2017-08-16T14:14:42+08:00
> >> > [INFO] Final Memory: 18M/309M
> >> > [INFO]
> >> >
> 
> >> > [ERROR] Failed to execute goal on project word-count-beam: Could not
> >> > resolve
> >> > dependencies for project org.example:word-count-beam:jar:0.1: The
> >> > following
> >> > artifacts could not be resolved:
> >> >
> >> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1,
> >> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1: Failure to
> >> > find
> >> >
> >> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> >> > in https://repo.maven.apache.org/maven2 was cached in the local
> >> > repository,
> >> > resolution will not be reattempted until the update interval of
> central
> >> > has
> >> > elapsed or updates are forced -> [Help 1]
> >> >
> >> >   Any idea?
> >> >
> >> > Thanks,
> >> > Huafeng
>


Re: Failed to run Wordcount example

2017-08-16 Thread Ismaël Mejía
I just executed the same commands that you pasted in your email and it
worked for me, can you verify that you are not having network issues
while downloading the dependencies with maven ?

On Wed, Aug 16, 2017 at 10:06 AM, Vincent Wang  wrote:
> Hi Ismaël,
>
>   I'm running the command
>
>  mvn archetype:generate \
>   -DarchetypeGroupId=org.apache.beam \
>   -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
>   -DarchetypeVersion=2.0.0 \
>   -DgroupId=org.example \
>   -DartifactId=word-count-beam \
>   -Dversion="0.1" \
>   -Dpackage=org.apache.beam.examples \
>   -DinteractiveMode=false
>
> and
>
> mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
>  -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner
>
> just the way on the quick start page. It seems that the example somehow
> takes its own version as some beam dependency's version accidentally.
>
> BTW, I'm using the latest master branch.
>
> Thanks,
> Huafeng
>
>
> Ismaël Mejía 于2017年8月16日周三 下午3:57写道:
>>
>> Hello,
>>
>> The error message shows that it is looking for the Beam 0.1 version
>> and that version does not exist in maven central.
>> You have to replace the version of Beam in the command you executed
>> with the latest version that means 2.0.0 at this moment and it should
>> work.
>>
>> Regards,
>> Ismaël
>>
>>
>> On Wed, Aug 16, 2017 at 8:21 AM, Vincent Wang  wrote:
>> > Hi guys,
>> >
>> >   I'm trying to run the wordcount example according to the quick start
>> > but I
>> > got following error:
>> >
>> > [INFO]
>> > 
>> > [INFO] Building word-count-beam 0.1
>> > [INFO]
>> > 
>> > [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
>> > missing, no dependency information available
>> > [WARNING] Failed to retrieve plugin descriptor for
>> > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
>> > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could
>> > not
>> > be resolved: Failure to find org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
>> > in
>> > https://repo.maven.apache.org/maven2 was cached in the local repository,
>> > resolution will not be reattempted until the update interval of central
>> > has
>> > elapsed or updates are forced
>> > [WARNING] The POM for org.apache.beam:beam-sdks-java-core:jar:0.1 is
>> > missing, no dependency information available
>> > [WARNING] The POM for
>> >
>> > org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
>> > is missing, no dependency information available
>> > [WARNING] The POM for
>> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1 is missing,
>> > no
>> > dependency information available
>> > [INFO]
>> > 
>> > [INFO] BUILD FAILURE
>> > [INFO]
>> > 
>> > [INFO] Total time: 0.960 s
>> > [INFO] Finished at: 2017-08-16T14:14:42+08:00
>> > [INFO] Final Memory: 18M/309M
>> > [INFO]
>> > 
>> > [ERROR] Failed to execute goal on project word-count-beam: Could not
>> > resolve
>> > dependencies for project org.example:word-count-beam:jar:0.1: The
>> > following
>> > artifacts could not be resolved:
>> >
>> > org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1,
>> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1: Failure to
>> > find
>> >
>> > org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
>> > in https://repo.maven.apache.org/maven2 was cached in the local
>> > repository,
>> > resolution will not be reattempted until the update interval of central
>> > has
>> > elapsed or updates are forced -> [Help 1]
>> >
>> >   Any idea?
>> >
>> > Thanks,
>> > Huafeng


Re: Failed to run Wordcount example

2017-08-16 Thread Vincent Wang
Hi Ismaël,

  I'm running the command

 mvn archetype:generate \
  -DarchetypeGroupId=org.apache.beam \
  -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \
  -DarchetypeVersion=2.0.0 \
  -DgroupId=org.example \
  -DartifactId=word-count-beam \
  -Dversion="0.1" \
  -Dpackage=org.apache.beam.examples \
  -DinteractiveMode=false

and

mvn compile exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \
 -Dexec.args="--inputFile=pom.xml --output=counts" -Pdirect-runner

just the way on the quick start page. It seems that the example somehow
takes its own version as some beam dependency's version accidentally.

BTW, I'm using the latest master branch.

Thanks,
Huafeng


Ismaël Mejía 于2017年8月16日周三 下午3:57写道:

> Hello,
>
> The error message shows that it is looking for the Beam 0.1 version
> and that version does not exist in maven central.
> You have to replace the version of Beam in the command you executed
> with the latest version that means 2.0.0 at this moment and it should
> work.
>
> Regards,
> Ismaël
>
>
> On Wed, Aug 16, 2017 at 8:21 AM, Vincent Wang  wrote:
> > Hi guys,
> >
> >   I'm trying to run the wordcount example according to the quick start
> but I
> > got following error:
> >
> > [INFO]
> > 
> > [INFO] Building word-count-beam 0.1
> > [INFO]
> > 
> > [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> > missing, no dependency information available
> > [WARNING] Failed to retrieve plugin descriptor for
> > org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> > org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could
> not
> > be resolved: Failure to find org.eclipse.m2e:lifecycle-mapping:jar:1.0.0
> in
> > https://repo.maven.apache.org/maven2 was cached in the local repository,
> > resolution will not be reattempted until the update interval of central
> has
> > elapsed or updates are forced
> > [WARNING] The POM for org.apache.beam:beam-sdks-java-core:jar:0.1 is
> > missing, no dependency information available
> > [WARNING] The POM for
> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> > is missing, no dependency information available
> > [WARNING] The POM for
> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1 is missing, no
> > dependency information available
> > [INFO]
> > 
> > [INFO] BUILD FAILURE
> > [INFO]
> > 
> > [INFO] Total time: 0.960 s
> > [INFO] Finished at: 2017-08-16T14:14:42+08:00
> > [INFO] Final Memory: 18M/309M
> > [INFO]
> > 
> > [ERROR] Failed to execute goal on project word-count-beam: Could not
> resolve
> > dependencies for project org.example:word-count-beam:jar:0.1: The
> following
> > artifacts could not be resolved:
> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1,
> > org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1: Failure to
> find
> >
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> > in https://repo.maven.apache.org/maven2 was cached in the local
> repository,
> > resolution will not be reattempted until the update interval of central
> has
> > elapsed or updates are forced -> [Help 1]
> >
> >   Any idea?
> >
> > Thanks,
> > Huafeng
>


Re: Failed to run Wordcount example

2017-08-16 Thread Ismaël Mejía
Hello,

The error message shows that it is looking for the Beam 0.1 version
and that version does not exist in maven central.
You have to replace the version of Beam in the command you executed
with the latest version that means 2.0.0 at this moment and it should
work.

Regards,
Ismaël


On Wed, Aug 16, 2017 at 8:21 AM, Vincent Wang  wrote:
> Hi guys,
>
>   I'm trying to run the wordcount example according to the quick start but I
> got following error:
>
> [INFO]
> 
> [INFO] Building word-count-beam 0.1
> [INFO]
> 
> [WARNING] The POM for org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 is
> missing, no dependency information available
> [WARNING] Failed to retrieve plugin descriptor for
> org.eclipse.m2e:lifecycle-mapping:1.0.0: Plugin
> org.eclipse.m2e:lifecycle-mapping:1.0.0 or one of its dependencies could not
> be resolved: Failure to find org.eclipse.m2e:lifecycle-mapping:jar:1.0.0 in
> https://repo.maven.apache.org/maven2 was cached in the local repository,
> resolution will not be reattempted until the update interval of central has
> elapsed or updates are forced
> [WARNING] The POM for org.apache.beam:beam-sdks-java-core:jar:0.1 is
> missing, no dependency information available
> [WARNING] The POM for
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> is missing, no dependency information available
> [WARNING] The POM for
> org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1 is missing, no
> dependency information available
> [INFO]
> 
> [INFO] BUILD FAILURE
> [INFO]
> 
> [INFO] Total time: 0.960 s
> [INFO] Finished at: 2017-08-16T14:14:42+08:00
> [INFO] Final Memory: 18M/309M
> [INFO]
> 
> [ERROR] Failed to execute goal on project word-count-beam: Could not resolve
> dependencies for project org.example:word-count-beam:jar:0.1: The following
> artifacts could not be resolved:
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1,
> org.apache.beam:beam-sdks-java-extensions-protobuf:jar:0.1: Failure to find
> org.apache.beam:beam-sdks-java-extensions-google-cloud-platform-core:jar:0.1
> in https://repo.maven.apache.org/maven2 was cached in the local repository,
> resolution will not be reattempted until the update interval of central has
> elapsed or updates are forced -> [Help 1]
>
>   Any idea?
>
> Thanks,
> Huafeng