How does param desiredBundleSizeBytes of BoundedSource#split get determined at runtime?

2017-07-10 Thread Ivan
Hi, we are trying to build a custom BoundedSource based on gRPC call. in 
class BoundedSource we got method below


public abstract java.util.List>> split(long desiredBundleSizeBytes,
 PipelineOptions 
  options)

  throws 
java.lang.Exception

at runtime, how does the parameter desiredBundleSizeBytes get determined, is it 
different under different runner? what's the behavior in Flink-Runner? what's 
the relational ship of this parameter and the parallelism we specify when in 
theFlinkPipelieOptions?




Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-10 Thread Eugene Kirpichov
I think Madhusudan's proposal does not involve reading the whole contents
of the memcached cluster - it's applied to a PCollection of keys.
So I'd suggest to call it MemcachedIO.lookup() rather than
MemcachedIO.read(). And it will not involve the questions of splitting -
however, it *will* involve snapshot consistency (looking up the same key at
different times may yield different results, including a null result).

Concur with others - please take a look at
https://beam.apache.org/documentation/io/authoring-overview/ and
https://beam.apache.org/contribute/ptransform-style-guide/ , as well as at
the code of other IO transforms. The proposed API contradicts several best
practices described in these documents, but is easily fixable.

I recommend to also consider how you plan to extend this to support other
commands - and which commands do you expect to ever support.
Also, I'm unsure about the usefulness of MemcachedIO.lookup(). What's an
example real-world use case for such a bulk lookup operation, where you
transform a PCollection of keys into a PCollection of key/value pairs? I
suppose such a use case exists, but I'd like to know more about it, to see
whether this is the best API for it.

On Mon, Jul 10, 2017 at 9:18 AM Lukasz Cwik 
wrote:

> Splitting on slabs should allow you to split more finely grained then per
> server since each server itself maintains this information. If you take a
> look at the memcached protocol, you can see that lru_crawler supports a
> metadump command which will enumerate all the key for a set of given slabs
> or for all the slabs.
>
> For the consistency part, you can get a snapshot like effect (snapshot like
> since its per server and not across the server farm) by combining
> the "watch mutations evictions" command on one connection with the
> "lru_crawler metadump all" on another connection to the same memcached
> server. By first connecting using a watcher and then performing a dump you
> can create two logical streams of data that can be joined to get a snapshot
> per server. If the amount of data/mutations/evications is small, you can
> perform all of this within a DoFn otherwise you can just treat each as two
> different outputs which you join and perform the same logical operation to
> rebuild the snapshot on a per key basis.
>
> Interestingly, the "watch mutations" command would allow one to build a
> streaming memcache IO which shows all changes occurring underneath.
>
> memcached protocol:
> https://github.com/memcached/memcached/blob/master/doc/protocol.txt
>
> On Mon, Jul 10, 2017 at 2:41 AM, Ismaël Mejía  wrote:
>
> > Hello,
> >
> > Thanks Lukasz for bring some of this subjects. I have briefly
> > discussed with the guys working on this they are the same team who did
> > HCatalogIO (Hive).
> >
> > We just analyzed the different libraries that allowed to develop this
> > integration from Java and decided that the most complete
> > implementation was spymemcached. One thing I really didn’t like of
> > their API is that there is not an abstraction for Mutation (like in
> > Bigtable/Hbase) but a corresponding method for each operation so to
> > make things easier we discussed to focus first on read/write.
> >
> > @Lukasz for the enumeration part, I am not sure I follow, we had just
> > discussed a naive approach for splitting by server given that
> > Memcached is not a cluster but a server farm ‘which means every server
> > is its own’ we thought this will be the easiest way to partition, is
> > there any technical issue that impeaches this (creating a
> > BoundedSource and just read per each server)? Or partitioning by slabs
> > will bring us a better optimization? (Notice I am far from an expert
> > on Memcached).
> >
> > For the consistency part I assumed it will be inconsistent when
> > reading, because I didn’t know how to do the snapshot but if you can
> > give us more details on how to do this, and why it is worth the effort
> > (vs the cost of the snapshot), this will be something interesting to
> > integrate.
> >
> > Thanks,
> > Ismaël
> >
> >
> > On Sun, Jul 9, 2017 at 7:39 PM, Lukasz Cwik 
> > wrote:
> > > For the source:
> > > Do you plan to support enumerating all the keys via cachedump /
> > lru_crawler
> > > metadump / ...?
> > > If there is an option which doesn't require enumerating the keys, how
> > will
> > > splitting be done (no splitting / splitting on slab ids / ...)?
> > > Can the cache be read while its still being modified (will effectively
> a
> > > snapshot be made using a watcher or is it expected that the cache will
> be
> > > read only or inconsistent when reading)?
> > >
> > > Also, as a usability point, all PTransforms are meant to be applied to
> > > PCollections and not vice versa.
> > > e.g.
> > > PCollection keys = ...;
> > > keys.apply(MemCacheIO.withConfig());
> > >
> > > This makes it so that people can write:
> > > PCollection<...> output 

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Sorry, just saw https://github.com/apache/beam/pull/2211

On Mon, Jul 10, 2017 at 5:37 PM, Robert Bradshaw  wrote:
> Any progress on this?
>
> On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot  wrote:
>> Hi all,
>>
>> We had a discussion with Kenn yesterday about point 1 bellow, I would like
>> to note it here on the ML:
>>
>> Using new method timer.set() instead of timer.setForNowPlus() makes the
>> timer fire at the right time.
>>
>> Another thing, regarding point 2: if I inject the window in the @Ontimer
>> method and print it, I see that at the moment the timer fires (at last
>> timestamp of the window), the window is the GlobalWindow. I guess that is
>> because the fixed window has just ended. Maybe the empty bagState that I get
>> here is due to the end of window (passing to the GlobalWindow). I mean, as
>> the states are scoped per window, and the window is different, then another
>> bagState instance gets injected. Hence the empty bagState
>>
>> WDYT?
>>
>> I will open a PR even if this work is not finished yet, that way, we will
>> have a convenient environment for discussing this code.
>>
>> Etienne
>>
>>
>> Le 03/03/2017 à 11:48, Etienne Chauchot a écrit :
>>>
>>> Hi all,
>>>
>>> @Kenn: I have enhanced my streaming test in
>>> https://github.com/echauchot/beam/tree/BEAM-135-BATCHING-PARDO in particular
>>> to check that BatchingParDo doesn't mess up windows. It seems that it
>>> actually does :)
>>>
>>> The input collection contains 10 elements timestamped at 1s interval and
>>> it is divided into fixed windows of 5s duration (so 2 windows). startTime is
>>> epoch. The timer is used to detect the end of the window and output the
>>> content of the batch (buffer) then.
>>>
>>> I added some logs and I noticed two strange things (that might be linked):
>>>
>>>
>>> 1-The timer is set twice, and it is set correctly
>>>
>>> INFOS: * SET TIMER * Delay of 4999 ms added to timestamp
>>> 1970-01-01T00:00:00.000Z set for window
>>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>>>
>>> INFOS: * SET TIMER * Delay of 4999 ms added to timestamp
>>> 1970-01-01T00:00:05.000Z set for window
>>> [1970-01-01T00:00:05.000Z..1970-01-01T00:00:10.000Z)
>>>
>>> It correctly fires twice but not at the right timeStamp:
>>>
>>> INFOS: * END OF WINDOW * for timer timestamp
>>> 1970-01-01T00:00:04.999Z
>>>
>>> =>Correct
>>>
>>> INFOS: * END OF WINDOW * for timer timestamp
>>> 1970-01-01T00:00:04.999Z
>>>
>>> => Incorrect (should fire at timestamp 1970-01-01T00:00:09.999Z)
>>>
>>> Do I need to call timer.cancel() after the timer has fired ? But
>>> timer.cancel() is not supported by DirectRunner yet.
>>>
>>>
>>>
>>> 2- in @OnTimer method the injected batch bagState parameter is empty
>>> whereas it was added some elements since last batch.clear() while processing
>>> the same window
>>>
>>> INFOS: * BATCH * clear
>>>
>>> INFOS: * BATCH * Add element for window
>>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>>>
>>> INFOS: * BATCH * Add element for window
>>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>>> ..
>>> INFOS: * END OF WINDOW * for timer timestamp
>>> 1970-01-01T00:00:04.999Z
>>> INFOS: * IN ONTIMER * batch size 0
>>>
>>> Am I doing something wrong with timers or is there something not totally
>>> finished with them (as you noticed they are quite new)?
>>>
>>> WDYT?
>>>
>>>
>>> Thanks
>>>
>>> Etienne
>>>
>>>
>>> Le 09/02/2017 à 09:55, Etienne Chauchot a écrit :

 Hi,

 @JB: good to know for the roadmap! thanks

 @Kenn: just to be clear: the timer fires fine. What I noticed is that it
 seems to be SET more than once because timer.setForNowPlus in called the
 @ProcessElement method. I am not 100% sure of it, what I noticed is that it
 started to work fine when I ensured to call timer.setForNowPlus only once. 
 I
 don't say it's a bug, this is just not what I understood when I read the
 javadoc, I understood that it would be set  only once per window, see
 javadoc bellow:

 An implementation of Timer is implicitly scoped - it may be scoped to a
 key and window, or a key, window, and trigger, etc.
 A timer exists in one of two states: set or unset. A timer can be set
 only for a single time per scope.

 I use the DirectRunner.

 For the key part: ok, makes sense.

 Thanks for your comments

 I'm leaving on vacation tonight for a little more than two weeks, I'll
 resume work then, maybe start a PR when it's ready.

 Etienne



 Le 08/02/2017 à 19:48, Kenneth Knowles a écrit :
>
> Hi Etienne,
>
> If the timer is firing n times for n elements, that's a bug in the
> runner /
> shared runner code. It should be deduped. Which runner? Can you file a
> JIRA
> against me to investigate? I'm still in the process of fleshing out more

Re: [BEAM-135] Utilities for "batching" elements in a DoFn

2017-07-10 Thread Robert Bradshaw
Any progress on this?

On Thu, Mar 9, 2017 at 1:43 AM, Etienne Chauchot  wrote:
> Hi all,
>
> We had a discussion with Kenn yesterday about point 1 bellow, I would like
> to note it here on the ML:
>
> Using new method timer.set() instead of timer.setForNowPlus() makes the
> timer fire at the right time.
>
> Another thing, regarding point 2: if I inject the window in the @Ontimer
> method and print it, I see that at the moment the timer fires (at last
> timestamp of the window), the window is the GlobalWindow. I guess that is
> because the fixed window has just ended. Maybe the empty bagState that I get
> here is due to the end of window (passing to the GlobalWindow). I mean, as
> the states are scoped per window, and the window is different, then another
> bagState instance gets injected. Hence the empty bagState
>
> WDYT?
>
> I will open a PR even if this work is not finished yet, that way, we will
> have a convenient environment for discussing this code.
>
> Etienne
>
>
> Le 03/03/2017 à 11:48, Etienne Chauchot a écrit :
>>
>> Hi all,
>>
>> @Kenn: I have enhanced my streaming test in
>> https://github.com/echauchot/beam/tree/BEAM-135-BATCHING-PARDO in particular
>> to check that BatchingParDo doesn't mess up windows. It seems that it
>> actually does :)
>>
>> The input collection contains 10 elements timestamped at 1s interval and
>> it is divided into fixed windows of 5s duration (so 2 windows). startTime is
>> epoch. The timer is used to detect the end of the window and output the
>> content of the batch (buffer) then.
>>
>> I added some logs and I noticed two strange things (that might be linked):
>>
>>
>> 1-The timer is set twice, and it is set correctly
>>
>> INFOS: * SET TIMER * Delay of 4999 ms added to timestamp
>> 1970-01-01T00:00:00.000Z set for window
>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>>
>> INFOS: * SET TIMER * Delay of 4999 ms added to timestamp
>> 1970-01-01T00:00:05.000Z set for window
>> [1970-01-01T00:00:05.000Z..1970-01-01T00:00:10.000Z)
>>
>> It correctly fires twice but not at the right timeStamp:
>>
>> INFOS: * END OF WINDOW * for timer timestamp
>> 1970-01-01T00:00:04.999Z
>>
>> =>Correct
>>
>> INFOS: * END OF WINDOW * for timer timestamp
>> 1970-01-01T00:00:04.999Z
>>
>> => Incorrect (should fire at timestamp 1970-01-01T00:00:09.999Z)
>>
>> Do I need to call timer.cancel() after the timer has fired ? But
>> timer.cancel() is not supported by DirectRunner yet.
>>
>>
>>
>> 2- in @OnTimer method the injected batch bagState parameter is empty
>> whereas it was added some elements since last batch.clear() while processing
>> the same window
>>
>> INFOS: * BATCH * clear
>>
>> INFOS: * BATCH * Add element for window
>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>>
>> INFOS: * BATCH * Add element for window
>> [1970-01-01T00:00:00.000Z..1970-01-01T00:00:05.000Z)
>> ..
>> INFOS: * END OF WINDOW * for timer timestamp
>> 1970-01-01T00:00:04.999Z
>> INFOS: * IN ONTIMER * batch size 0
>>
>> Am I doing something wrong with timers or is there something not totally
>> finished with them (as you noticed they are quite new)?
>>
>> WDYT?
>>
>>
>> Thanks
>>
>> Etienne
>>
>>
>> Le 09/02/2017 à 09:55, Etienne Chauchot a écrit :
>>>
>>> Hi,
>>>
>>> @JB: good to know for the roadmap! thanks
>>>
>>> @Kenn: just to be clear: the timer fires fine. What I noticed is that it
>>> seems to be SET more than once because timer.setForNowPlus in called the
>>> @ProcessElement method. I am not 100% sure of it, what I noticed is that it
>>> started to work fine when I ensured to call timer.setForNowPlus only once. I
>>> don't say it's a bug, this is just not what I understood when I read the
>>> javadoc, I understood that it would be set  only once per window, see
>>> javadoc bellow:
>>>
>>> An implementation of Timer is implicitly scoped - it may be scoped to a
>>> key and window, or a key, window, and trigger, etc.
>>> A timer exists in one of two states: set or unset. A timer can be set
>>> only for a single time per scope.
>>>
>>> I use the DirectRunner.
>>>
>>> For the key part: ok, makes sense.
>>>
>>> Thanks for your comments
>>>
>>> I'm leaving on vacation tonight for a little more than two weeks, I'll
>>> resume work then, maybe start a PR when it's ready.
>>>
>>> Etienne
>>>
>>>
>>>
>>> Le 08/02/2017 à 19:48, Kenneth Knowles a écrit :

 Hi Etienne,

 If the timer is firing n times for n elements, that's a bug in the
 runner /
 shared runner code. It should be deduped. Which runner? Can you file a
 JIRA
 against me to investigate? I'm still in the process of fleshing out more
 and more RunnableOnService (aka ValidatesRunner) tests so I will surely
 add
 one (existing tests already OOMed without deduping, so it wasn't at the
 top
 of my priority list)

 If the end user doesn't have a natural key, I would just add one and
 

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-10 Thread Kenneth Knowles
Have we heard anything about the remaining issues on
https://s.apache.org/beam-2.1.0-burndown? Can we move them all to the
following release?

On Mon, Jul 10, 2017 at 1:22 PM, Jean-Baptiste Onofré 
wrote:

> Hi all,
>
> all cherry-pick PRs have been merged on the release-2.1.0 branch.
>
> I'm launching couple of builds and tests. I will cut the RC1 just after.
> Stay tuned for the vote e-mail ! ;)
>
> Regards
> JB
>
>
> On 07/06/2017 06:43 AM, Jean-Baptiste Onofré wrote:
>
>> No problem, just define the fix version in Jira to 2.1.0 and I will wait
>> to have all Jira fixed with this version before cutting the RC1.
>>
>> Thanks !
>> Regards
>> JB
>>
>> On 07/06/2017 05:48 AM, Kenneth Knowles wrote:
>>
>>> +1 to these and IMO we should treat all of the remaining 10 items on the
>>> burndown. I think all but one or two are in late-stage PR right now. It
>>> should be easy to merge them before July 10.
>>>
>>> On Wed, Jul 5, 2017 at 4:28 PM, Raghu Angadi >> >
>>> wrote:
>>>
>>> I would like to request merging two Kafka related PRs : #3461
 , #3492
 . Especially the second one,
 as
 it improves user experience in case of server misconfiguration that
 prevents connections between workers and the Kafka cluster.

 On Wed, Jul 5, 2017 at 8:10 AM, Jean-Baptiste Onofré 
 wrote:

 FYI, the release branch has been created.
>
> I plan to do the RC1 tomorrow, so you have time to cherry-pick if
> wanted
>
 ;)

>
> Regards
> JB
>
>
> On 07/05/2017 07:52 AM, Jean-Baptiste Onofré wrote:
>
> Hi,
>>
>> I'm building with the last changes and I will cut the release branch
>>
> just

> after.
>>
>> I keep you posted.
>>
>> Regards
>> JB
>>
>> On 07/03/2017 05:37 PM, Jean-Baptiste Onofré wrote:
>>
>> Hi guys,
>>>
>>> The 2.1.0 release branch will be great in a hour or so.
>>>
>>> I updated Jira, please, take a look and review the one assigned to
>>> you
>>> where I left a comment.
>>>
>>> Thanks !
>>> Regards
>>> JB
>>>
>>> On 07/01/2017 07:06 AM, Jean-Baptiste Onofré wrote:
>>>
>>> It sounds good Kenn. Thanks.

 I will ask in the Jira.

 Thanks !
 Regards
 JB

 On 07/01/2017 06:58 AM, Kenneth Knowles wrote:

 SGTM
>
> There are still 23 open issues tagged with 2.1.0. Since this is not
> reduced
> from last time, I think it is fair to ask them to be cherry-picked
> to
> the
> release branch or deferred.
>
> To the assignees of these issues: can you please evaluate whether
> completion is imminent?
>
> I want to also note that many PMC members have Monday and Tuesday
>
 off,

> providing a strong incentive to take the whole week off. So I suggest
> July
> 10 as the earliest day for RC1.
>
> On Fri, Jun 30, 2017 at 8:53 PM, Jean-Baptiste Onofré <
>
 j...@nanthrax.net

>
>> wrote:
>
> Hi,
>
>>
>> The build is now back to normal, I will create the release branch
>> today.
>>
>> Regards
>> JB
>>
>>
>> On 06/29/2017 03:22 PM, Jean-Baptiste Onofré wrote:
>>
>> FYI,
>>
>>>
>>> I opened https://github.com/apache/beam/pull/3471 to fix the
>>> SpannerIO
>>> test on my machine. I don't understand how the test can pass
>>>
>> without

> defining the project ID (it should always fail on the
>>>
>> precondition).

>
>>> I will create the release branch once this PR is merged.
>>>
>>> Regards
>>> JB
>>>
>>> On 06/29/2017 06:29 AM, Jean-Baptiste Onofré wrote:
>>>
>>> Hi Stephen,
>>>

 Thanks for the update.

 I have an issue on my machine with SpannerIOTest. I will create

>>> the

> release branch as soon as this is fix. Then, we will be able to
 cherry-pick
 the fix we want.

 I keep you posted.

 Regards
 JB

 On 06/28/2017 09:37 PM, Stephen Sisk wrote:

 hi!

>
> I'm hopeful we can get the fix for BEAM-2533 into this release
> as
> well,
> there's a bigtable fix in the next version that'd be good to
>
 have.

> The
> bigtable client release 

Re: [DISCUSS] Apache Beam 2.1.0 release next week ?

2017-07-10 Thread Jean-Baptiste Onofré

Hi all,

all cherry-pick PRs have been merged on the release-2.1.0 branch.

I'm launching couple of builds and tests. I will cut the RC1 just after. Stay 
tuned for the vote e-mail ! ;)


Regards
JB

On 07/06/2017 06:43 AM, Jean-Baptiste Onofré wrote:
No problem, just define the fix version in Jira to 2.1.0 and I will wait to have 
all Jira fixed with this version before cutting the RC1.


Thanks !
Regards
JB

On 07/06/2017 05:48 AM, Kenneth Knowles wrote:

+1 to these and IMO we should treat all of the remaining 10 items on the
burndown. I think all but one or two are in late-stage PR right now. It
should be easy to merge them before July 10.

On Wed, Jul 5, 2017 at 4:28 PM, Raghu Angadi 
wrote:


I would like to request merging two Kafka related PRs : #3461
, #3492
. Especially the second one, as
it improves user experience in case of server misconfiguration that
prevents connections between workers and the Kafka cluster.

On Wed, Jul 5, 2017 at 8:10 AM, Jean-Baptiste Onofré 
wrote:


FYI, the release branch has been created.

I plan to do the RC1 tomorrow, so you have time to cherry-pick if wanted

;)


Regards
JB


On 07/05/2017 07:52 AM, Jean-Baptiste Onofré wrote:


Hi,

I'm building with the last changes and I will cut the release branch

just

after.

I keep you posted.

Regards
JB

On 07/03/2017 05:37 PM, Jean-Baptiste Onofré wrote:


Hi guys,

The 2.1.0 release branch will be great in a hour or so.

I updated Jira, please, take a look and review the one assigned to you
where I left a comment.

Thanks !
Regards
JB

On 07/01/2017 07:06 AM, Jean-Baptiste Onofré wrote:


It sounds good Kenn. Thanks.

I will ask in the Jira.

Thanks !
Regards
JB

On 07/01/2017 06:58 AM, Kenneth Knowles wrote:


SGTM

There are still 23 open issues tagged with 2.1.0. Since this is not
reduced
from last time, I think it is fair to ask them to be cherry-picked to
the
release branch or deferred.

To the assignees of these issues: can you please evaluate whether
completion is imminent?

I want to also note that many PMC members have Monday and Tuesday

off,

providing a strong incentive to take the whole week off. So I suggest
July
10 as the earliest day for RC1.

On Fri, Jun 30, 2017 at 8:53 PM, Jean-Baptiste Onofré <

j...@nanthrax.net



wrote:

Hi,


The build is now back to normal, I will create the release branch
today.

Regards
JB


On 06/29/2017 03:22 PM, Jean-Baptiste Onofré wrote:

FYI,


I opened https://github.com/apache/beam/pull/3471 to fix the
SpannerIO
test on my machine. I don't understand how the test can pass

without

defining the project ID (it should always fail on the

precondition).


I will create the release branch once this PR is merged.

Regards
JB

On 06/29/2017 06:29 AM, Jean-Baptiste Onofré wrote:

Hi Stephen,


Thanks for the update.

I have an issue on my machine with SpannerIOTest. I will create

the

release branch as soon as this is fix. Then, we will be able to
cherry-pick
the fix we want.

I keep you posted.

Regards
JB

On 06/28/2017 09:37 PM, Stephen Sisk wrote:

hi!


I'm hopeful we can get the fix for BEAM-2533 into this release as
well,
there's a bigtable fix in the next version that'd be good to

have.

The
bigtable client release should be in the next day or two.

S

On Mon, Jun 26, 2017 at 12:03 PM Jean-Baptiste Onofré <
j...@nanthrax.net>
wrote:

Hi guys,



just a quick update about the 2.1.0 release.

I will complete the Jira triage tomorrow.

I plan to create the release branch Wednesday.

Thanks !
Regards
JB

On 06/22/2017 04:23 AM, Jean-Baptiste Onofré wrote:

Hi guys,


As we released 2.0.0 (first stable release) last month during
ApacheCon,

and to


maintain our release pace, I would like to release 2.1.0 next

week.

This release would include lot of bug fixes and some new
features:

https://issues.apache.org/jira/projects/BEAM/versions/12340528

I'm volunteer to be release manager for this one.

Thoughts ?

Thanks,
Regards
JB



--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com








--

Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com











--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com









--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


Re: MergeBot is here!

2017-07-10 Thread Jason Kuster
(quick update re #2 above): ~4 minutes after I reopened the ticket, it's
fixed.
https://github.com/apache/infrastructure-puppet/commit/709944291da5e8aea711cb8578f0594deb45e222
updates the website to the correct address. Infra is once again the best.

On Mon, Jul 10, 2017 at 12:38 PM, Jason Kuster 
wrote:

> Glad to hear everyone's pretty happy about it! Have a couple answers for
> your questions.
>
> Ted: I believe the MFA stuff (two-factor auth on github) is necessary for
> getting the additional features on GitHub (reviewer, etc), but may not be
> necessary for MergeBot. I'll check in with Infra and get back to you.
>
> Ismaël: Great questions! Answered below.
>
> 1. The code will likely be transitioned over to an Infra-controlled
> repository, but for now is under my account: https://github.com/
> jasonkuster/merge-bot. It's written in Python, so Python aficionados
> especially feel free to take a look, kick the tires, and open PRs.
>
> 2. Glad to hear mergebot worked for you. :) The website not showing
> appears to be an issue with transitioning to GitBox; it seems a reference
> may have not been updated. Thanks for the report! I've reopened
> https://issues.apache.org/jira/browse/INFRA-14405 to track.
>
> 3. I'd love to chat about this more! It's totally possible to have
> mergebot pause and show the status of the repository before it does the
> final push, but given that mergebot is merging PRs serially I don't want to
> have someone forget to click "ok" and block other people's PRs. One other
> option would be to allow the person requesting the merge to say something
> like "@asfgit merge squash" or "@asfgit merge nosquash", parametrizing the
> merge request. Thoughts?
>
> On Mon, Jul 10, 2017 at 10:52 AM, Mark Liu 
> wrote:
>
>> +1 Awesome work!
>>
>> Thank you Jason!!!
>>
>> Mark
>>
>> On Mon, Jul 10, 2017 at 10:05 AM, Robert Bradshaw <
>> rober...@google.com.invalid> wrote:
>>
>> > +1, this is great! I'll second Ismaël's list requests, especially 1 and
>> 3.
>> >
>> > On Mon, Jul 10, 2017 at 2:09 AM, Ismaël Mejía 
>> wrote:
>> > > Excellent!, Automation of such repetitive (and error-prone) tasks is
>> > > strongly welcomed.
>> > >
>> > > Thanks for making this happen Jason!
>> > >
>> > > Some comments:
>> > >
>> > > 1. I suppose the code of mergebot is now part of Apache Infra, no? Do
>> > > you know exactly where the code is hosted? And what is the procedure
>> > > in case somebody wants to improve it or change something in the
>> > > future? I suppose other projects can/would benefit of this.
>> > >
>> > > 2. I configured and used the mergebot with success, however the
>> > > website does not reflect the changes of the PR I 'merged', I suppose
>> > > there are still some things we have to fix, because the changes are
>> > > not there.
>> > > (The PR I am talking about is https://github.com/apache/
>> > beam-site/pull/264)
>> > >
>> > > 3. Other thing I noticed is that the mergebot didn’t squash the
>> > > commits (this probably makes sense) and I didn’t realize this to do it
>> > > before because there is not a preview of the state of the actions that
>> > > the mergebot is going to do, can this eventually be improved? (I don’t
>> > > know if this makes sense because this will add an extra validation
>> > > step and we must trust robots anyway :P).
>> > >
>> > > This new issue is something that reviewers/committers must remember,
>> > > and talking about this we need to update this in the contribution
>> > > guide to include the configuration/use of the mergebot instructions.
>> > >
>> > > Thanks again Jason and the other who made this possible, this is
>> great!
>> > > Ismaël
>> > >
>> > > ps. I’m eager to see this included too for the beam project.
>> > >
>> > > On Sat, Jul 8, 2017 at 7:28 AM, tarush grover <
>> tarushappt...@gmail.com>
>> > wrote:
>> > >> This is really good!!
>> > >>
>> > >> Regards,
>> > >> Tarush
>> > >>
>> > >> On Sat, 8 Jul 2017 at 10:20 AM, Jean-Baptiste Onofré <
>> j...@nanthrax.net>
>> > >> wrote:
>> > >>
>> > >>> That's awesome !
>> > >>>
>> > >>> Thanks Jason !
>> > >>>
>> > >>> Regards
>> > >>> JB
>> > >>>
>> > >>> On 07/07/2017 10:21 PM, Jason Kuster wrote:
>> > >>> > Hi Beam Community,
>> > >>> >
>> > >>> > Early on in the project, we had a number of discussions about
>> > creating an
>> > >>> > automated tool for merging pull requests. I’m happy to announce
>> that
>> > >>> we’ve
>> > >>> > developed such a tool and it is ready for experimental usage in
>> Beam!
>> > >>> >
>> > >>> > The tool, MergeBot, works in conjunction with ASF’s existing
>> GitBox
>> > tool,
>> > >>> > providing numerous benefits:
>> > >>> > * Automating the merge process -- instead of many manual steps
>> with
>> > >>> > multiple Git remotes, merging is as simple as commenting a
>> specific
>> > >>> command
>> > >>> > in GitHub.
>> > >>> > * Automatic verification of each pull request against the latest
>> > master

Re: MergeBot is here!

2017-07-10 Thread Jason Kuster
Glad to hear everyone's pretty happy about it! Have a couple answers for
your questions.

Ted: I believe the MFA stuff (two-factor auth on github) is necessary for
getting the additional features on GitHub (reviewer, etc), but may not be
necessary for MergeBot. I'll check in with Infra and get back to you.

Ismaël: Great questions! Answered below.

1. The code will likely be transitioned over to an Infra-controlled
repository, but for now is under my account:
https://github.com/jasonkuster/merge-bot. It's written in Python, so Python
aficionados especially feel free to take a look, kick the tires, and open
PRs.

2. Glad to hear mergebot worked for you. :) The website not showing appears
to be an issue with transitioning to GitBox; it seems a reference may have
not been updated. Thanks for the report! I've reopened
https://issues.apache.org/jira/browse/INFRA-14405 to track.

3. I'd love to chat about this more! It's totally possible to have mergebot
pause and show the status of the repository before it does the final push,
but given that mergebot is merging PRs serially I don't want to have
someone forget to click "ok" and block other people's PRs. One other option
would be to allow the person requesting the merge to say something like
"@asfgit merge squash" or "@asfgit merge nosquash", parametrizing the merge
request. Thoughts?

On Mon, Jul 10, 2017 at 10:52 AM, Mark Liu 
wrote:

> +1 Awesome work!
>
> Thank you Jason!!!
>
> Mark
>
> On Mon, Jul 10, 2017 at 10:05 AM, Robert Bradshaw <
> rober...@google.com.invalid> wrote:
>
> > +1, this is great! I'll second Ismaël's list requests, especially 1 and
> 3.
> >
> > On Mon, Jul 10, 2017 at 2:09 AM, Ismaël Mejía  wrote:
> > > Excellent!, Automation of such repetitive (and error-prone) tasks is
> > > strongly welcomed.
> > >
> > > Thanks for making this happen Jason!
> > >
> > > Some comments:
> > >
> > > 1. I suppose the code of mergebot is now part of Apache Infra, no? Do
> > > you know exactly where the code is hosted? And what is the procedure
> > > in case somebody wants to improve it or change something in the
> > > future? I suppose other projects can/would benefit of this.
> > >
> > > 2. I configured and used the mergebot with success, however the
> > > website does not reflect the changes of the PR I 'merged', I suppose
> > > there are still some things we have to fix, because the changes are
> > > not there.
> > > (The PR I am talking about is https://github.com/apache/
> > beam-site/pull/264)
> > >
> > > 3. Other thing I noticed is that the mergebot didn’t squash the
> > > commits (this probably makes sense) and I didn’t realize this to do it
> > > before because there is not a preview of the state of the actions that
> > > the mergebot is going to do, can this eventually be improved? (I don’t
> > > know if this makes sense because this will add an extra validation
> > > step and we must trust robots anyway :P).
> > >
> > > This new issue is something that reviewers/committers must remember,
> > > and talking about this we need to update this in the contribution
> > > guide to include the configuration/use of the mergebot instructions.
> > >
> > > Thanks again Jason and the other who made this possible, this is great!
> > > Ismaël
> > >
> > > ps. I’m eager to see this included too for the beam project.
> > >
> > > On Sat, Jul 8, 2017 at 7:28 AM, tarush grover  >
> > wrote:
> > >> This is really good!!
> > >>
> > >> Regards,
> > >> Tarush
> > >>
> > >> On Sat, 8 Jul 2017 at 10:20 AM, Jean-Baptiste Onofré  >
> > >> wrote:
> > >>
> > >>> That's awesome !
> > >>>
> > >>> Thanks Jason !
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>> On 07/07/2017 10:21 PM, Jason Kuster wrote:
> > >>> > Hi Beam Community,
> > >>> >
> > >>> > Early on in the project, we had a number of discussions about
> > creating an
> > >>> > automated tool for merging pull requests. I’m happy to announce
> that
> > >>> we’ve
> > >>> > developed such a tool and it is ready for experimental usage in
> Beam!
> > >>> >
> > >>> > The tool, MergeBot, works in conjunction with ASF’s existing GitBox
> > tool,
> > >>> > providing numerous benefits:
> > >>> > * Automating the merge process -- instead of many manual steps with
> > >>> > multiple Git remotes, merging is as simple as commenting a specific
> > >>> command
> > >>> > in GitHub.
> > >>> > * Automatic verification of each pull request against the latest
> > master
> > >>> > code before merge.
> > >>> > * Merge queue enforces an ordering of pull requests, which ensures
> > that
> > >>> > pull requests that have bad interactions don’t get merged at the
> same
> > >>> time.
> > >>> > * GitBox-enabled features such as reviewers, assignees, and labels.
> > >>> > * Enabling enhanced use of tools like reviewable.io.
> > >>> >
> > >>> > If you are a committer, the first step is to link your Apache and
> > GitHub
> > >>> > accounts at 

Re: BEAM-934 - Jira permission and pull request

2017-07-10 Thread Apache Enthu
Thanks Kenn.

Thanks,
Almas

On 11 Jul 2017 00:05, "Kenneth Knowles"  wrote:

> I've added you as a Contributor, which is the role you will need to assign
> issues.
>
> On Mon, Jul 10, 2017 at 11:12 AM, Apache Enthu 
> wrote:
>
> > Hi could you please add me (eralmas7) as committer please?
> >
> > Thanks,
> > Almas
> >
> > On Mon, Jul 10, 2017 at 8:57 AM, Kenneth Knowles  >
> > wrote:
> >
> > > Just a tiny correction - I think the JIRA role "contributor" for the
> Beam
> > > can take JIRAs without a committer assigning to them. But definitely
> you
> > > _must_ have this role or even a committer cannot give you a JIRA.
> > >
> > > What is your JIRA account, so I can add you as a contributor?
> > >
> > > Kenn
> > >
> > > On Sun, Jul 9, 2017 at 12:18 PM, Jean-Baptiste Onofré  >
> > > wrote:
> > >
> > > > By Jira ID, I mean YOUR account ID.
> > > >
> > > > Committer is a long process: https://beam.apache.org/contri
> > > > bute/contribution-guide/#granting-more-rights-to-a-contributor
> > > >
> > > > I suggest to take a long on how Apache works:
> > > >
> > > > http://www.apache.org/
> > > >
> > > > Regards
> > > > JB
> > > >
> > > >
> > > > On 07/09/2017 07:47 PM, Apache Enthu wrote:
> > > >
> > > >> Thanks JB. Jira Id is in Subject BEAM-934.
> > > >> https://issues.apache.org/jira/browse/BEAM-934
> > > >>
> > > >> How do i get added as committer please? Or are there any criteria
> for
> > me
> > > >> to
> > > >> be added to as Committer?
> > > >>
> > > >> Thanks,
> > > >> Almas
> > > >>
> > > >> On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré <
> j...@nanthrax.net
> > >
> > > >> wrote:
> > > >>
> > > >> Hi,
> > > >>>
> > > >>> you have to be committer to do the assignment.
> > > >>>
> > > >>> If you provide your Jira ID, I will assign the Jira to you.
> > > >>>
> > > >>> Regards
> > > >>> JB
> > > >>>
> > > >>>
> > > >>> On 07/09/2017 08:30 AM, Apache Enthu wrote:
> > > >>>
> > > >>> Hi,
> > > 
> > >  I'm newbie in this project and i have picked up simple jira from
> the
> > >  open
> > >  jira list.
> > > 
> > >  It seems i don't have permission to assign jira to myself and move
> > it
> > >  through its lifecycle.
> > > 
> > >  I have created the pull request https://github.com/apache/beam
> > >  /pull/3526
> > > 
> > >  Could you please let me know how could i get permission in Jira.
> > Also
> > >  please could you approve my pull request.
> > > 
> > >  Thanks,
> > >  Almas
> > > 
> > > 
> > >  --
> > > >>> Jean-Baptiste Onofré
> > > >>> jbono...@apache.org
> > > >>> http://blog.nanthrax.net
> > > >>> Talend - http://www.talend.com
> > > >>>
> > > >>>
> > > >>
> > > > --
> > > > Jean-Baptiste Onofré
> > > > jbono...@apache.org
> > > > http://blog.nanthrax.net
> > > > Talend - http://www.talend.com
> > > >
> > >
> >
>


Re: BEAM-934 - Jira permission and pull request

2017-07-10 Thread Kenneth Knowles
I've added you as a Contributor, which is the role you will need to assign
issues.

On Mon, Jul 10, 2017 at 11:12 AM, Apache Enthu 
wrote:

> Hi could you please add me (eralmas7) as committer please?
>
> Thanks,
> Almas
>
> On Mon, Jul 10, 2017 at 8:57 AM, Kenneth Knowles 
> wrote:
>
> > Just a tiny correction - I think the JIRA role "contributor" for the Beam
> > can take JIRAs without a committer assigning to them. But definitely you
> > _must_ have this role or even a committer cannot give you a JIRA.
> >
> > What is your JIRA account, so I can add you as a contributor?
> >
> > Kenn
> >
> > On Sun, Jul 9, 2017 at 12:18 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> > > By Jira ID, I mean YOUR account ID.
> > >
> > > Committer is a long process: https://beam.apache.org/contri
> > > bute/contribution-guide/#granting-more-rights-to-a-contributor
> > >
> > > I suggest to take a long on how Apache works:
> > >
> > > http://www.apache.org/
> > >
> > > Regards
> > > JB
> > >
> > >
> > > On 07/09/2017 07:47 PM, Apache Enthu wrote:
> > >
> > >> Thanks JB. Jira Id is in Subject BEAM-934.
> > >> https://issues.apache.org/jira/browse/BEAM-934
> > >>
> > >> How do i get added as committer please? Or are there any criteria for
> me
> > >> to
> > >> be added to as Committer?
> > >>
> > >> Thanks,
> > >> Almas
> > >>
> > >> On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré  >
> > >> wrote:
> > >>
> > >> Hi,
> > >>>
> > >>> you have to be committer to do the assignment.
> > >>>
> > >>> If you provide your Jira ID, I will assign the Jira to you.
> > >>>
> > >>> Regards
> > >>> JB
> > >>>
> > >>>
> > >>> On 07/09/2017 08:30 AM, Apache Enthu wrote:
> > >>>
> > >>> Hi,
> > 
> >  I'm newbie in this project and i have picked up simple jira from the
> >  open
> >  jira list.
> > 
> >  It seems i don't have permission to assign jira to myself and move
> it
> >  through its lifecycle.
> > 
> >  I have created the pull request https://github.com/apache/beam
> >  /pull/3526
> > 
> >  Could you please let me know how could i get permission in Jira.
> Also
> >  please could you approve my pull request.
> > 
> >  Thanks,
> >  Almas
> > 
> > 
> >  --
> > >>> Jean-Baptiste Onofré
> > >>> jbono...@apache.org
> > >>> http://blog.nanthrax.net
> > >>> Talend - http://www.talend.com
> > >>>
> > >>>
> > >>
> > > --
> > > Jean-Baptiste Onofré
> > > jbono...@apache.org
> > > http://blog.nanthrax.net
> > > Talend - http://www.talend.com
> > >
> >
>


Re: BEAM-934 - Jira permission and pull request

2017-07-10 Thread Apache Enthu
Hi could you please add me (eralmas7) as committer please?

Thanks,
Almas

On Mon, Jul 10, 2017 at 8:57 AM, Kenneth Knowles 
wrote:

> Just a tiny correction - I think the JIRA role "contributor" for the Beam
> can take JIRAs without a committer assigning to them. But definitely you
> _must_ have this role or even a committer cannot give you a JIRA.
>
> What is your JIRA account, so I can add you as a contributor?
>
> Kenn
>
> On Sun, Jul 9, 2017 at 12:18 PM, Jean-Baptiste Onofré 
> wrote:
>
> > By Jira ID, I mean YOUR account ID.
> >
> > Committer is a long process: https://beam.apache.org/contri
> > bute/contribution-guide/#granting-more-rights-to-a-contributor
> >
> > I suggest to take a long on how Apache works:
> >
> > http://www.apache.org/
> >
> > Regards
> > JB
> >
> >
> > On 07/09/2017 07:47 PM, Apache Enthu wrote:
> >
> >> Thanks JB. Jira Id is in Subject BEAM-934.
> >> https://issues.apache.org/jira/browse/BEAM-934
> >>
> >> How do i get added as committer please? Or are there any criteria for me
> >> to
> >> be added to as Committer?
> >>
> >> Thanks,
> >> Almas
> >>
> >> On Sun, Jul 9, 2017 at 5:40 PM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >> Hi,
> >>>
> >>> you have to be committer to do the assignment.
> >>>
> >>> If you provide your Jira ID, I will assign the Jira to you.
> >>>
> >>> Regards
> >>> JB
> >>>
> >>>
> >>> On 07/09/2017 08:30 AM, Apache Enthu wrote:
> >>>
> >>> Hi,
> 
>  I'm newbie in this project and i have picked up simple jira from the
>  open
>  jira list.
> 
>  It seems i don't have permission to assign jira to myself and move it
>  through its lifecycle.
> 
>  I have created the pull request https://github.com/apache/beam
>  /pull/3526
> 
>  Could you please let me know how could i get permission in Jira. Also
>  please could you approve my pull request.
> 
>  Thanks,
>  Almas
> 
> 
>  --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
> >>>
> >>
> > --
> > Jean-Baptiste Onofré
> > jbono...@apache.org
> > http://blog.nanthrax.net
> > Talend - http://www.talend.com
> >
>


Re: MergeBot is here!

2017-07-10 Thread Mark Liu
+1 Awesome work!

Thank you Jason!!!

Mark

On Mon, Jul 10, 2017 at 10:05 AM, Robert Bradshaw <
rober...@google.com.invalid> wrote:

> +1, this is great! I'll second Ismaël's list requests, especially 1 and 3.
>
> On Mon, Jul 10, 2017 at 2:09 AM, Ismaël Mejía  wrote:
> > Excellent!, Automation of such repetitive (and error-prone) tasks is
> > strongly welcomed.
> >
> > Thanks for making this happen Jason!
> >
> > Some comments:
> >
> > 1. I suppose the code of mergebot is now part of Apache Infra, no? Do
> > you know exactly where the code is hosted? And what is the procedure
> > in case somebody wants to improve it or change something in the
> > future? I suppose other projects can/would benefit of this.
> >
> > 2. I configured and used the mergebot with success, however the
> > website does not reflect the changes of the PR I 'merged', I suppose
> > there are still some things we have to fix, because the changes are
> > not there.
> > (The PR I am talking about is https://github.com/apache/
> beam-site/pull/264)
> >
> > 3. Other thing I noticed is that the mergebot didn’t squash the
> > commits (this probably makes sense) and I didn’t realize this to do it
> > before because there is not a preview of the state of the actions that
> > the mergebot is going to do, can this eventually be improved? (I don’t
> > know if this makes sense because this will add an extra validation
> > step and we must trust robots anyway :P).
> >
> > This new issue is something that reviewers/committers must remember,
> > and talking about this we need to update this in the contribution
> > guide to include the configuration/use of the mergebot instructions.
> >
> > Thanks again Jason and the other who made this possible, this is great!
> > Ismaël
> >
> > ps. I’m eager to see this included too for the beam project.
> >
> > On Sat, Jul 8, 2017 at 7:28 AM, tarush grover 
> wrote:
> >> This is really good!!
> >>
> >> Regards,
> >> Tarush
> >>
> >> On Sat, 8 Jul 2017 at 10:20 AM, Jean-Baptiste Onofré 
> >> wrote:
> >>
> >>> That's awesome !
> >>>
> >>> Thanks Jason !
> >>>
> >>> Regards
> >>> JB
> >>>
> >>> On 07/07/2017 10:21 PM, Jason Kuster wrote:
> >>> > Hi Beam Community,
> >>> >
> >>> > Early on in the project, we had a number of discussions about
> creating an
> >>> > automated tool for merging pull requests. I’m happy to announce that
> >>> we’ve
> >>> > developed such a tool and it is ready for experimental usage in Beam!
> >>> >
> >>> > The tool, MergeBot, works in conjunction with ASF’s existing GitBox
> tool,
> >>> > providing numerous benefits:
> >>> > * Automating the merge process -- instead of many manual steps with
> >>> > multiple Git remotes, merging is as simple as commenting a specific
> >>> command
> >>> > in GitHub.
> >>> > * Automatic verification of each pull request against the latest
> master
> >>> > code before merge.
> >>> > * Merge queue enforces an ordering of pull requests, which ensures
> that
> >>> > pull requests that have bad interactions don’t get merged at the same
> >>> time.
> >>> > * GitBox-enabled features such as reviewers, assignees, and labels.
> >>> > * Enabling enhanced use of tools like reviewable.io.
> >>> >
> >>> > If you are a committer, the first step is to link your Apache and
> GitHub
> >>> > accounts at http://gitbox.apache.org/setup. Once the accounts are
> >>> linked,
> >>> > you should have immediate access to new GitHub features like labels,
> >>> > assignees, etc., as well as the ability to merge pull requests by
> simply
> >>> > commenting “@asfgit merge” on the pull request. MergeBot will
> communicate
> >>> > its status back to you via the same mechanism used already by
> Jenkins.
> >>> >
> >>> > This functionally is currently enabled for the “beam-site” repository
> >>> only.
> >>> > In this phase, we’d like to gather feedback and improve the user
> >>> experience
> >>> > -- so please comment back early and often. Once we are happy with the
> >>> > experience, we’ll deploy it on the main Beam repository, and
> recommend it
> >>> > for wider adoption.
> >>> >
> >>> > I’d like to give a huge thank you to the Apache Infrastructure team,
> >>> > especially Daniel Pono Takamori, Daniel Gruno, and Chris
> Thistlethwaite
> >>> who
> >>> > were instrumental in bringing this project to fruition. Additionally,
> >>> this
> >>> > could not have happened without the extensive work Davor put in to
> keep
> >>> > things moving along. Thank you Davor.
> >>> >
> >>> > Looking forward to hearing your comments and feedback. Thanks.
> >>> >
> >>> > Jason
> >>> >
> >>>
> >>> --
> >>> Jean-Baptiste Onofré
> >>> jbono...@apache.org
> >>> http://blog.nanthrax.net
> >>> Talend - http://www.talend.com
> >>>
>


Re: BEAM-933 - Not reproduceable

2017-07-10 Thread Kenneth Knowles
I believe you can create a JIRA without any special permissions. Here's a
direct link that I think will work:
https://issues.apache.org/jira/secure/CreateIssue!default.jspa

Kenn

On Mon, Jul 10, 2017 at 10:10 AM, Kenneth Knowles  wrote:

> Well, if it is not reproducible then could you issue a pull request
> deleting that bit of the pom.xml? That would resolve the issue, too.
>
> Kenn
>
> On Mon, Jul 10, 2017 at 10:01 AM, Apache Enthu 
> wrote:
>
>> Thanks Kenneth. Unfortunately i'm still unable to reproduce the issue. Did
>> anyone had a chance to look at the other issue that i raised in my mail?
>> Unfortunately as i am not a committer and hence am assuming i wont be
>> entitle to create jira for the same.
>>
>> [INFO] --- maven-checkstyle-plugin:2.17:check (default) @
>> beam-examples-java ---
>> [INFO] Starting audit...
>> Audit done.
>> [INFO]
>> [INFO] >>> findbugs-maven-plugin:3.0.4:check (default) > :findbugs @
>> beam-examples-java >>>
>> [INFO]
>> [INFO] --- findbugs-maven-plugin:3.0.4:findbugs (findbugs) @
>> beam-examples-java ---
>> [INFO] Downloading:
>> http://nexus.codehaus.org/snapshots/org/apache/beam/beam-
>> sdks-java-build-tools/2.2.0-SNAPSHOT/maven-metadata.xml
>> [WARNING] Could not transfer metadata
>> org.apache.beam:beam-sdks-java-build-tools:2.2.0-SNAPSHOT/
>> maven-metadata.xml
>> from/to codehaus-snapshots (http://nexus.codehaus.org/snapshots/):
>> nexus.codehaus.org
>> [INFO]
>> [INFO] <<< findbugs-maven-plugin:3.0.4:check (default) < :findbugs @
>> beam-examples-java <<<
>> [INFO]
>> [INFO] --- findbugs-maven-plugin:3.0.4:check (default) @
>> beam-examples-java
>> ---
>> [INFO]
>> [INFO] --- maven-surefire-plugin:2.20:test (default-test) @
>> beam-examples-java ---
>>
>>
>> On Mon, Jul 10, 2017 at 9:09 AM, Kenneth Knowles 
>> wrote:
>>
>> > I think the key line you will want to change is here:
>> > https://github.com/apache/beam/blob/master/examples/java/pom.xml#L375
>> >
>> > On Sun, Jul 9, 2017 at 12:17 AM, Apache Enthu 
>> > wrote:
>> >
>> > > Hi,
>> > >
>> > > Is BEAM-933 already fixed? I'm unable to reproduce the bug by running
>> > maven
>> > > build. Here's what i see:
>> > >
>> > > [INFO] --- maven-compiler-plugin:3.6.1:testCompile
>> > (default-testCompile) @
>> > > beam-examples-java ---
>> > > [INFO] Changes detected - recompiling the module!
>> > > [INFO] Compiling 15 source files to
>> > > C:\workspace-apache\beam\examples\java\target\test-classes
>> > > [INFO]
>> > > /C:/workspace-apache/beam/examples/java/src/test/java/
>> > > org/apache/beam/examples/WindowedWordCountIT.java:
>> > > C:\workspace-apache\beam\examples\java\src\test\java\
>> > > org\apache\beam\examples\WindowedWordCountIT.java
>> > > uses or overrides a deprecated API.
>> > > [INFO]
>> > > /C:/workspace-apache/beam/examples/java/src/test/java/
>> > > org/apache/beam/examples/WindowedWordCountIT.java:
>> > > Recompile with -Xlint:deprecation for details.
>> > > [INFO]
>> > > /C:/workspace-apache/beam/examples/java/src/test/java/
>> > > org/apache/beam/examples/complete/AutoCompleteTest.java:
>> > > Some input files use unchecked or unsafe operations.
>> > > [INFO]
>> > > /C:/workspace-apache/beam/examples/java/src/test/java/
>> > > org/apache/beam/examples/complete/AutoCompleteTest.java:
>> > > Recompile with -Xlint:unchecked for details.
>> > > [INFO]
>> > >
>> > >
>> > >
>> > > *[INFO] --- maven-checkstyle-plugin:2.17:check (default) @
>> > > beam-examples-java ---[INFO] Starting audit...Audit done.*[INFO]
>> > > [INFO] --- maven-surefire-plugin:2.20:test (default-test) @
>> > > beam-examples-java ---
>> > > [INFO]
>> > >
>> > > Could you please check and let me know, so we could close this issue.
>> > >
>> > > Also there seems to be an issue with DebuggingWordCountTest, running
>> on
>> > > Windows. It says:
>> > >
>> > >
>> > > *org.apache.beam.sdk.Pipeline$PipelineExecutionException:
>> > > java.lang.IllegalStateException: Unable to find registrar for c*
>> at
>> > > org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
>> > > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>> > > at
>> > > org.apache.beam.examples.DebuggingWordCount.main(
>> > > DebuggingWordCount.java:160)
>> > > at
>> > > org.apache.beam.examples.DebuggingWordCountTest.testDebuggin
>> gWordCount(
>> > > DebuggingWordCountTest.java:53)
>> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > > at
>> > > sun.reflect.NativeMethodAccessorImpl.invoke(
>> > NativeMethodAccessorImpl.java:
>> > > 62)
>> > > at
>> > > sun.reflect.DelegatingMethodAccessorImpl.invoke(
>> > > DelegatingMethodAccessorImpl.java:43)
>> > > at java.lang.reflect.Method.invoke(Method.java:498)
>> > > at
>> > > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
>> > > FrameworkMethod.java:50)
>> > > at
>> > > org.junit.internal.runners.model.ReflectiveCallable.run(
>> > > ReflectiveCallable.java:12)

Re: BEAM-933 - Not reproduceable

2017-07-10 Thread Kenneth Knowles
Well, if it is not reproducible then could you issue a pull request
deleting that bit of the pom.xml? That would resolve the issue, too.

Kenn

On Mon, Jul 10, 2017 at 10:01 AM, Apache Enthu 
wrote:

> Thanks Kenneth. Unfortunately i'm still unable to reproduce the issue. Did
> anyone had a chance to look at the other issue that i raised in my mail?
> Unfortunately as i am not a committer and hence am assuming i wont be
> entitle to create jira for the same.
>
> [INFO] --- maven-checkstyle-plugin:2.17:check (default) @
> beam-examples-java ---
> [INFO] Starting audit...
> Audit done.
> [INFO]
> [INFO] >>> findbugs-maven-plugin:3.0.4:check (default) > :findbugs @
> beam-examples-java >>>
> [INFO]
> [INFO] --- findbugs-maven-plugin:3.0.4:findbugs (findbugs) @
> beam-examples-java ---
> [INFO] Downloading:
> http://nexus.codehaus.org/snapshots/org/apache/beam/
> beam-sdks-java-build-tools/2.2.0-SNAPSHOT/maven-metadata.xml
> [WARNING] Could not transfer metadata
> org.apache.beam:beam-sdks-java-build-tools:2.2.0-
> SNAPSHOT/maven-metadata.xml
> from/to codehaus-snapshots (http://nexus.codehaus.org/snapshots/):
> nexus.codehaus.org
> [INFO]
> [INFO] <<< findbugs-maven-plugin:3.0.4:check (default) < :findbugs @
> beam-examples-java <<<
> [INFO]
> [INFO] --- findbugs-maven-plugin:3.0.4:check (default) @
> beam-examples-java
> ---
> [INFO]
> [INFO] --- maven-surefire-plugin:2.20:test (default-test) @
> beam-examples-java ---
>
>
> On Mon, Jul 10, 2017 at 9:09 AM, Kenneth Knowles 
> wrote:
>
> > I think the key line you will want to change is here:
> > https://github.com/apache/beam/blob/master/examples/java/pom.xml#L375
> >
> > On Sun, Jul 9, 2017 at 12:17 AM, Apache Enthu 
> > wrote:
> >
> > > Hi,
> > >
> > > Is BEAM-933 already fixed? I'm unable to reproduce the bug by running
> > maven
> > > build. Here's what i see:
> > >
> > > [INFO] --- maven-compiler-plugin:3.6.1:testCompile
> > (default-testCompile) @
> > > beam-examples-java ---
> > > [INFO] Changes detected - recompiling the module!
> > > [INFO] Compiling 15 source files to
> > > C:\workspace-apache\beam\examples\java\target\test-classes
> > > [INFO]
> > > /C:/workspace-apache/beam/examples/java/src/test/java/
> > > org/apache/beam/examples/WindowedWordCountIT.java:
> > > C:\workspace-apache\beam\examples\java\src\test\java\
> > > org\apache\beam\examples\WindowedWordCountIT.java
> > > uses or overrides a deprecated API.
> > > [INFO]
> > > /C:/workspace-apache/beam/examples/java/src/test/java/
> > > org/apache/beam/examples/WindowedWordCountIT.java:
> > > Recompile with -Xlint:deprecation for details.
> > > [INFO]
> > > /C:/workspace-apache/beam/examples/java/src/test/java/
> > > org/apache/beam/examples/complete/AutoCompleteTest.java:
> > > Some input files use unchecked or unsafe operations.
> > > [INFO]
> > > /C:/workspace-apache/beam/examples/java/src/test/java/
> > > org/apache/beam/examples/complete/AutoCompleteTest.java:
> > > Recompile with -Xlint:unchecked for details.
> > > [INFO]
> > >
> > >
> > >
> > > *[INFO] --- maven-checkstyle-plugin:2.17:check (default) @
> > > beam-examples-java ---[INFO] Starting audit...Audit done.*[INFO]
> > > [INFO] --- maven-surefire-plugin:2.20:test (default-test) @
> > > beam-examples-java ---
> > > [INFO]
> > >
> > > Could you please check and let me know, so we could close this issue.
> > >
> > > Also there seems to be an issue with DebuggingWordCountTest, running on
> > > Windows. It says:
> > >
> > >
> > > *org.apache.beam.sdk.Pipeline$PipelineExecutionException:
> > > java.lang.IllegalStateException: Unable to find registrar for c*at
> > > org.apache.beam.sdk.Pipeline.run(Pipeline.java:303)
> > > at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
> > > at
> > > org.apache.beam.examples.DebuggingWordCount.main(
> > > DebuggingWordCount.java:160)
> > > at
> > > org.apache.beam.examples.DebuggingWordCountTest.
> testDebuggingWordCount(
> > > DebuggingWordCountTest.java:53)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > at
> > > sun.reflect.NativeMethodAccessorImpl.invoke(
> > NativeMethodAccessorImpl.java:
> > > 62)
> > > at
> > > sun.reflect.DelegatingMethodAccessorImpl.invoke(
> > > DelegatingMethodAccessorImpl.java:43)
> > > at java.lang.reflect.Method.invoke(Method.java:498)
> > > at
> > > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(
> > > FrameworkMethod.java:50)
> > > at
> > > org.junit.internal.runners.model.ReflectiveCallable.run(
> > > ReflectiveCallable.java:12)
> > > at
> > > org.junit.runners.model.FrameworkMethod.invokeExplosively(
> > > FrameworkMethod.java:47)
> > > at
> > > org.junit.internal.runners.statements.InvokeMethod.
> > > evaluate(InvokeMethod.java:17)
> > > at org.junit.rules.ExternalResource$1.evaluate(
> > > ExternalResource.java:48)
> > > at org.junit.rules.RunRules.evaluate(RunRules.java:20)
> > > at 

Mixed-Language Pipelines

2017-07-10 Thread Thomas Groh
Hey everyone;

I've been working on a design for implementing multi-language pipelines
within the Beam SDKs (also known as mix-and-match). This kind of pipeline
lets us reuse transforms written in one language in any other language that
supports the Runner API and the Fn API. Letting us write a transform once
and run it everywhere is pretty exciting to me, so I'm pretty excited for
this.

The document is available at
https://s.apache.org/beam-mixed-language-pipelines. Comments and questions
are welcome, and I'm looking forwards to any feedback available.

Thanks,

Thomas


Re: [PROPOSAL] Connectors for memcache and Couchbase

2017-07-10 Thread Lukasz Cwik
Splitting on slabs should allow you to split more finely grained then per
server since each server itself maintains this information. If you take a
look at the memcached protocol, you can see that lru_crawler supports a
metadump command which will enumerate all the key for a set of given slabs
or for all the slabs.

For the consistency part, you can get a snapshot like effect (snapshot like
since its per server and not across the server farm) by combining
the "watch mutations evictions" command on one connection with the
"lru_crawler metadump all" on another connection to the same memcached
server. By first connecting using a watcher and then performing a dump you
can create two logical streams of data that can be joined to get a snapshot
per server. If the amount of data/mutations/evications is small, you can
perform all of this within a DoFn otherwise you can just treat each as two
different outputs which you join and perform the same logical operation to
rebuild the snapshot on a per key basis.

Interestingly, the "watch mutations" command would allow one to build a
streaming memcache IO which shows all changes occurring underneath.

memcached protocol:
https://github.com/memcached/memcached/blob/master/doc/protocol.txt

On Mon, Jul 10, 2017 at 2:41 AM, Ismaël Mejía  wrote:

> Hello,
>
> Thanks Lukasz for bring some of this subjects. I have briefly
> discussed with the guys working on this they are the same team who did
> HCatalogIO (Hive).
>
> We just analyzed the different libraries that allowed to develop this
> integration from Java and decided that the most complete
> implementation was spymemcached. One thing I really didn’t like of
> their API is that there is not an abstraction for Mutation (like in
> Bigtable/Hbase) but a corresponding method for each operation so to
> make things easier we discussed to focus first on read/write.
>
> @Lukasz for the enumeration part, I am not sure I follow, we had just
> discussed a naive approach for splitting by server given that
> Memcached is not a cluster but a server farm ‘which means every server
> is its own’ we thought this will be the easiest way to partition, is
> there any technical issue that impeaches this (creating a
> BoundedSource and just read per each server)? Or partitioning by slabs
> will bring us a better optimization? (Notice I am far from an expert
> on Memcached).
>
> For the consistency part I assumed it will be inconsistent when
> reading, because I didn’t know how to do the snapshot but if you can
> give us more details on how to do this, and why it is worth the effort
> (vs the cost of the snapshot), this will be something interesting to
> integrate.
>
> Thanks,
> Ismaël
>
>
> On Sun, Jul 9, 2017 at 7:39 PM, Lukasz Cwik 
> wrote:
> > For the source:
> > Do you plan to support enumerating all the keys via cachedump /
> lru_crawler
> > metadump / ...?
> > If there is an option which doesn't require enumerating the keys, how
> will
> > splitting be done (no splitting / splitting on slab ids / ...)?
> > Can the cache be read while its still being modified (will effectively a
> > snapshot be made using a watcher or is it expected that the cache will be
> > read only or inconsistent when reading)?
> >
> > Also, as a usability point, all PTransforms are meant to be applied to
> > PCollections and not vice versa.
> > e.g.
> > PCollection keys = ...;
> > keys.apply(MemCacheIO.withConfig());
> >
> > This makes it so that people can write:
> > PCollection<...> output =
> > input.apply(ptransform1).apply(ptransform2).apply(...);
> > It also makes it so that a PTransform can be applied to multiple
> > PCollections.
> >
> > If you haven't already, I would also suggest that you take a look at the
> > Pipeline I/O guide: https://beam.apache.org/documentation/io/io-toc/
> > Talks about various usability points and how to write a good I/O
> connector.
> >
> >
> > On Sat, Jul 8, 2017 at 9:31 PM, Jean-Baptiste Onofré 
> > wrote:
> >
> >> Hi,
> >>
> >> Great job !
> >>
> >> I'm looking forward for the PRs review.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 07/08/2017 09:50 AM, Madhusudan Borkar wrote:
> >>
> >>> Hi,
> >>> We are proposing to build connectors for memcache first and then use it
> >>> for
> >>> Couchbase. The connector for memcache will be build as a IOTransform
> and
> >>> then it can be used for other memcache implementations including
> >>> Couchbase.
> >>>
> >>> 1. As Source
> >>>
> >>> input will be a key(String / byte[]), output will be a KV value>
> >>>
> >>> where key - String / byte[]
> >>>
> >>> value - String / byte[]
> >>>
> >>> Spymemcached supports a multi-get operation where it takes a bunch
> of
> >>> keys and retrieves the associated values, the input PCollection
> can
> >>> be
> >>> bundled into multiple batches and each batch can be submitted via the
> >>> multi-get operation.
> >>>
> >>> PCollection> 

Jenkins build is back to normal : beam_Release_NightlySnapshot #473

2017-07-10 Thread Apache Jenkins Server
See