Re: Documenting Metrics API?

2018-05-15 Thread Etienne Chauchot
Hi Pablo,
I don't know if it is what you seek but we have at least this doc that is user 
facing but it is a bit old:
https://docs.google.com/document/d/1voyUIQ2DrWkoY-BsJwM8YvF4gGKB76CDG8BYL8XBc7A/edit#heading=h.vv2fbulkp7t
Etienne
Le vendredi 11 mai 2018 à 15:00 -0700, Lukasz Cwik a écrit :
> The programming guide doesn't have the information your looking for. Like 
> Kenn says it should have it but it currently
> doesn't.
> 
> On Fri, May 11, 2018 at 1:02 PM Kenneth Knowles  wrote:
> > I think the programming guide needs to have end-user documentation.
> > 
> > Kenn
> > 
> > On Fri, May 11, 2018 at 12:58 PM Lukasz Cwik  wrote:
> > > Are you speaking about metrics related to portability? If so, Alex shared 
> > > this doc a while back: https://s.apache.
> > > org/beam-fn-api-metrics
> > > 
> > > Otherwise, I'm not aware of any metrics related documentation for Apache 
> > > Beam on the website.
> > > 
> > > 
> > > 
> > > On Fri, May 11, 2018 at 12:02 PM Pablo Estrada  wrote:
> > > > Hello all,
> > > > I could not find a place were the Beam Metrics API is well detailed in 
> > > > the Beam website. Is there a JIRA
> > > > tracking this? Perhaps our use-case driven docs + java/pydoc cover it 
> > > > well enough, but I'm not sure that that's
> > > > the case.
> > > > 
> > > > Thanks
> > > > -P 
> > > > -- 
> > > > Got feedback? go/pabloem-feedback
> > > > 

Re: Documenting Metrics API?

2018-05-15 Thread Aviem Zur
There is an open task for this on JIRA:

https://issues.apache.org/jira/browse/BEAM-1974

On Tue, May 15, 2018 at 10:56 AM Etienne Chauchot 
wrote:

> Hi Pablo,
> I don't know if it is what you seek but we have at least this doc that is
> user facing but it is a bit old:
>
> https://docs.google.com/document/d/1voyUIQ2DrWkoY-BsJwM8YvF4gGKB76CDG8BYL8XBc7A/edit#heading=h.vv2fbulkp7t
>
> Etienne
>
> Le vendredi 11 mai 2018 à 15:00 -0700, Lukasz Cwik a écrit :
>
> The programming guide doesn't have the information your looking for. Like
> Kenn says it should have it but it currently doesn't.
>
> On Fri, May 11, 2018 at 1:02 PM Kenneth Knowles  wrote:
>
> I think the programming guide needs to have end-user documentation.
>
> Kenn
>
> On Fri, May 11, 2018 at 12:58 PM Lukasz Cwik  wrote:
>
> Are you speaking about metrics related to portability? If so, Alex shared
> this doc a while back: https://s.apache.org/beam-fn-api-metrics
>
> Otherwise, I'm not aware of any metrics related documentation for Apache
> Beam on the website.
>
>
>
> On Fri, May 11, 2018 at 12:02 PM Pablo Estrada  wrote:
>
> Hello all,
> I could not find a place were the Beam Metrics API is well detailed in the
> Beam website. Is there a JIRA tracking this? Perhaps our use-case driven
> docs + java/pydoc cover it well enough, but I'm not sure that that's the
> case.
>
> Thanks
> -P
> --
> Got feedback? go/pabloem-feedback
> 
>
>
>
>


[SQL] Cross Join Operation

2018-05-15 Thread Kai Jiang
Hi everyone,

To prove the idea of GSoC project, I was working on some simple TPC-DS
queries running with given generated data on direct runner. query example


The example is executed with TPC-DS query 9
.
Briefly, Query 9 uses case when clauses to select 5 counting numbers
from store_sales
(table 1). In order to show those result numbers, case when clause inside
one select clause. In short, it looks like:
SELECT

CASE WHEN ( SELECT count(*)  FROM  table 1 WHERE. )
THEN condition 1
ELSE condition 2,
.
CASE WHEN .

FROM table 2

IIUC, this query doesn't need join operation on table 1 and table 2 since
outside select clause doesn't need to interfere with table 1.
But, the program shows it does and throws errors message said
"java.lang.UnsupportedOperationException: CROSS JOIN is not supported". (error
message detail
)

To make the query work, I am wondering where I can start with:
1. see logic plan?
Will logic plan explain why the query need CROSS JOIN?

2. cross join support?
I checked all queries in TPC-DS benchmark. Almost every query uses cross
join. It is an important feature needs to implement. Unlike other join, it
consumes a lot of computing resource. But, I think we need cross join in
the future. and support both in join-library? I noticed James has open
BEAM-2194  for supporting
cross join.

Looking forward to comments!

Best,
Kai

ᐧ


Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Lukasz Cwik
I misspoke when I said portability semantics and should have said
portability design/implementation. This is why I had a follow-up e-mail and
clarified that I'm confused on:
* I don't understand how you would want close to change the semantics of a
user state specification and how it affects the lifetime of user state?
** Does it represent committing information within a bundle?
** Does it mean that user state can ignore the replayable and consistent
semantics for a lifetime of a bundle semantics?

I'm trying to tie back what does a `ReadableState readIterator()`
means for Runner authors and how it solves the memory/close() problem for
Samza. Based upon https://s.apache.org/beam-state, reading and writing of
state must be consistent. Does this mean that Samza must use snapshots for
the lifetime of a bundle? If so, I don't see how adding a
`ReadableState readIterator()` allows Samza to ignore the
consistency requirement and be allowed to free snapshots.

It might be worthwhile to setup a three way hangouts call to help me as I
don't have the same level of context which can be shared back to this
thread. Xinyu / Kenn, how about we setup a time using Slack?

On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles  wrote:

> I feel like this discussion is kind of far from the primary intention. The
> point of ParDo(stateful DoFn) is to enable naive single-threaded code in a
> style intuitive to a beginning imperative programmer. So:
>
>  - the return value of read() should act like an immutable value
>  - if there is a read after a write, that read should reflect the changes
> written, because there's a trivial happens-before relationship induced by
> program order
>  - there are no non-trivial happens-before relationships
>  - a write after a read should not affect the value read before
>
> From that starting point, the limitations on expressiveness (single
> threaded per-key-and-window-and-step) give rise to embarrassingly parallel
> computation and GC.
>
> So when you say "portability semantics" it makes me concerned. The word
> "semantics" refers to the mapping between what a user writes and what it
> means. Primarily, that means the conceptual translation between a primitive
> PTransform and the corresponding PCollection-to-PCollection operation. It
> is true that portability complicates things because a user's code can only
> be given meaning relative to an SDK harness. But the end-user intention is
> unchanged. If we had time and resources to give the Fn API a solid spec, it
> should be such that the SDK harness has little choice but to implement the
> primitives as intended.
>
> In other words, it is the job of
> https://s.apache.org/beam-fn-state-api-and-bundle-processing* to
> implement https://s.apache.org/beam-state. The discussion of adding
> `ReadableState readIterator()` to BagState seems consistent with
> the latter.
>
> Kenn
>
> *IIUC you've chosen to use the same underlying proto service for side
> inputs and by-reference values. That's an implementation detail that I have
> no particular opinion about except that if it complicates implementing the
> primary motivator for state, it may not be a great fit.
>
>
>
> On Mon, May 14, 2018 at 7:35 PM Kenneth Knowles  wrote:
>
>> I don't follow why allowing freeing resources would be counter to the
>> spec. I don't really know what you mean by consistent for a bundle. State,
>> in the sense of the user-facing per-key-and-window state API, is single
>> threaded and scoped to a single DoFn. There's no one else who can write the
>> state. If a BagState is read and written and read again, the user-facing
>> logic should be unaware of the resources and not have any logic to deal
>> with consistency.
>>
>> Kenn
>>
>> On Mon, May 14, 2018 at 6:09 PM Lukasz Cwik  wrote:
>>
>>> Hmm, some of the problem I'm dealing with is:
>>> * I don't understand how you would want close to change the semantics of
>>> a user state specification and how it affects the lifetime of user state?
>>> ** Does it represent committing information within a bundle?
>>> ** Does it mean that user state can ignore the replayable and consistent
>>> semantics for a lifetime of a bundle semantics?
>>> * I do understand that close semantics may make it easier for Samza
>>> runner to support state that is greater then memory, but what does it
>>> provide to users and how would they benefit (like what new scenarios would
>>> it support)?
>>> * I don't understand in RocksDb why you need a snapshot in the first
>>> place, and whether RocksDb can support seeking inside or outside a snapshot
>>> relatively efficiently?
>>>
>>> There seems to be some context lost initially with a discussion that you
>>> and Kenn had, is there a copy of that which could be shared? May help me
>>> get up to speed on to the problem that is being solved.
>>>
>>>
>>> On Mon, May 14, 2018 at 5:11 PM Xinyu Liu  wrote:
>>>
 I will take a look at the docs to understand the problem better. A
 minor comment to 2) is that I do

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Kenneth Knowles
OK, got it. But what consistency are you referring to? I was trying to
point out that there's nothing but straight-line program order consistency.
There's only one actor doing all the reads and all the writes.

Kenn

On Tue, May 15, 2018 at 8:39 AM Lukasz Cwik  wrote:

> I misspoke when I said portability semantics and should have said
> portability design/implementation. This is why I had a follow-up e-mail and
> clarified that I'm confused on:
> * I don't understand how you would want close to change the semantics of a
> user state specification and how it affects the lifetime of user state?
> ** Does it represent committing information within a bundle?
> ** Does it mean that user state can ignore the replayable and consistent
> semantics for a lifetime of a bundle semantics?
>
> I'm trying to tie back what does a `ReadableState
> readIterator()` means for Runner authors and how it solves the
> memory/close() problem for Samza. Based upon
> https://s.apache.org/beam-state, reading and writing of state must be
> consistent. Does this mean that Samza must use snapshots for the lifetime
> of a bundle? If so, I don't see how adding a `ReadableState
> readIterator()` allows Samza to ignore the consistency requirement and be
> allowed to free snapshots.
>
> It might be worthwhile to setup a three way hangouts call to help me as I
> don't have the same level of context which can be shared back to this
> thread. Xinyu / Kenn, how about we setup a time using Slack?
>
> On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles  wrote:
>
>> I feel like this discussion is kind of far from the primary intention.
>> The point of ParDo(stateful DoFn) is to enable naive single-threaded code
>> in a style intuitive to a beginning imperative programmer. So:
>>
>>  - the return value of read() should act like an immutable value
>>  - if there is a read after a write, that read should reflect the changes
>> written, because there's a trivial happens-before relationship induced by
>> program order
>>  - there are no non-trivial happens-before relationships
>>  - a write after a read should not affect the value read before
>>
>> From that starting point, the limitations on expressiveness (single
>> threaded per-key-and-window-and-step) give rise to embarrassingly parallel
>> computation and GC.
>>
>> So when you say "portability semantics" it makes me concerned. The word
>> "semantics" refers to the mapping between what a user writes and what it
>> means. Primarily, that means the conceptual translation between a primitive
>> PTransform and the corresponding PCollection-to-PCollection operation. It
>> is true that portability complicates things because a user's code can only
>> be given meaning relative to an SDK harness. But the end-user intention is
>> unchanged. If we had time and resources to give the Fn API a solid spec, it
>> should be such that the SDK harness has little choice but to implement the
>> primitives as intended.
>>
>> In other words, it is the job of
>> https://s.apache.org/beam-fn-state-api-and-bundle-processing* to
>> implement https://s.apache.org/beam-state. The discussion of adding
>> `ReadableState readIterator()` to BagState seems consistent with
>> the latter.
>>
>> Kenn
>>
>> *IIUC you've chosen to use the same underlying proto service for side
>> inputs and by-reference values. That's an implementation detail that I have
>> no particular opinion about except that if it complicates implementing the
>> primary motivator for state, it may not be a great fit.
>>
>>
>>
>> On Mon, May 14, 2018 at 7:35 PM Kenneth Knowles  wrote:
>>
>>> I don't follow why allowing freeing resources would be counter to the
>>> spec. I don't really know what you mean by consistent for a bundle. State,
>>> in the sense of the user-facing per-key-and-window state API, is single
>>> threaded and scoped to a single DoFn. There's no one else who can write the
>>> state. If a BagState is read and written and read again, the user-facing
>>> logic should be unaware of the resources and not have any logic to deal
>>> with consistency.
>>>
>>> Kenn
>>>
>>> On Mon, May 14, 2018 at 6:09 PM Lukasz Cwik  wrote:
>>>
 Hmm, some of the problem I'm dealing with is:
 * I don't understand how you would want close to change the semantics
 of a user state specification and how it affects the lifetime of user 
 state?
 ** Does it represent committing information within a bundle?
 ** Does it mean that user state can ignore the replayable and
 consistent semantics for a lifetime of a bundle semantics?
 * I do understand that close semantics may make it easier for Samza
 runner to support state that is greater then memory, but what does it
 provide to users and how would they benefit (like what new scenarios would
 it support)?
 * I don't understand in RocksDb why you need a snapshot in the first
 place, and whether RocksDb can support seeking inside or outside a snapshot
 relatively efficiently?

>

Re: [SQL] Cross Join Operation

2018-05-15 Thread Kenneth Knowles
The logical plan should show you where the cross join is needed. Here is
where it is logged:
https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L150

(It should probably be put to DEBUG level)

If I look at the original template, like
https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query9.tpl
I see conditions "[RC.1]". Are those templates expected to be filled with
references to the `reason` table, perhaps? How does that change things?

I still think it would be good to support CROSS JOIN if we can - the
problem of course is huge data size, but when one side is small it would be
good for it to work simply.

Kenn

On Tue, May 15, 2018 at 7:41 AM Kai Jiang  wrote:

> Hi everyone,
>
> To prove the idea of GSoC project, I was working on some simple TPC-DS
> queries running with given generated data on direct runner. query example
> 
>
> The example is executed with TPC-DS query 9
> .
> Briefly, Query 9 uses case when clauses to select 5 counting numbers from 
> store_sales
> (table 1). In order to show those result numbers, case when clause inside
> one select clause. In short, it looks like:
> SELECT
>
> CASE WHEN ( SELECT count(*)  FROM  table 1 WHERE. )
> THEN condition 1
> ELSE condition 2,
> .
> CASE WHEN .
>
> FROM table 2
>
> IIUC, this query doesn't need join operation on table 1 and table 2 since
> outside select clause doesn't need to interfere with table 1.
> But, the program shows it does and throws errors message said
> "java.lang.UnsupportedOperationException: CROSS JOIN is not supported". (error
> message detail
> )
>
> To make the query work, I am wondering where I can start with:
> 1. see logic plan?
> Will logic plan explain why the query need CROSS JOIN?
>
> 2. cross join support?
> I checked all queries in TPC-DS benchmark. Almost every query uses cross
> join. It is an important feature needs to implement. Unlike other join, it
> consumes a lot of computing resource. But, I think we need cross join in
> the future. and support both in join-library? I noticed James has open
> BEAM-2194  for
> supporting cross join.
>
> Looking forward to comments!
>
> Best,
> Kai
>
> ᐧ
>


Re: [SQL] Cross Join Operation

2018-05-15 Thread Andrew Pilloud
Calcite does not have the concept of a "CROSS JOIN". It shows up in the
plan as a LogicalJoin with condition=[true]. We could try rejecting the
cross join at the planning stage by returning null for them
in BeamJoinRule.convert(), which might result in a different plan. But
looking at your query, you have a cross join unless the where clause on the
inner select contains a row from the outer select.

Andrew

On Tue, May 15, 2018 at 9:15 AM Kenneth Knowles  wrote:

> The logical plan should show you where the cross join is needed. Here is
> where it is logged:
> https://github.com/apache/beam/blob/master/sdks/java/extensions/sql/src/main/java/org/apache/beam/sdk/extensions/sql/impl/planner/BeamQueryPlanner.java#L150
>
> (It should probably be put to DEBUG level)
>
> If I look at the original template, like
> https://github.com/gregrahn/tpcds-kit/blob/master/query_templates/query9.tpl
> I see conditions "[RC.1]". Are those templates expected to be filled with
> references to the `reason` table, perhaps? How does that change things?
>
> I still think it would be good to support CROSS JOIN if we can - the
> problem of course is huge data size, but when one side is small it would be
> good for it to work simply.
>
> Kenn
>
> On Tue, May 15, 2018 at 7:41 AM Kai Jiang  wrote:
>
>> Hi everyone,
>>
>> To prove the idea of GSoC project, I was working on some simple TPC-DS
>> queries running with given generated data on direct runner. query example
>> 
>>
>> The example is executed with TPC-DS query 9
>> .
>> Briefly, Query 9 uses case when clauses to select 5 counting numbers
>> from store_sales (table 1). In order to show those result numbers, case
>> when clause inside one select clause. In short, it looks like:
>> SELECT
>>
>> CASE WHEN ( SELECT count(*)  FROM  table 1 WHERE. )
>> THEN condition 1
>> ELSE condition 2,
>> .
>> CASE WHEN .
>>
>> FROM table 2
>>
>> IIUC, this query doesn't need join operation on table 1 and table 2 since
>> outside select clause doesn't need to interfere with table 1.
>> But, the program shows it does and throws errors message said
>> "java.lang.UnsupportedOperationException: CROSS JOIN is not supported". 
>> (error
>> message detail
>> )
>>
>> To make the query work, I am wondering where I can start with:
>> 1. see logic plan?
>> Will logic plan explain why the query need CROSS JOIN?
>>
>> 2. cross join support?
>> I checked all queries in TPC-DS benchmark. Almost every query uses cross
>> join. It is an important feature needs to implement. Unlike other join, it
>> consumes a lot of computing resource. But, I think we need cross join in
>> the future. and support both in join-library? I noticed James has open
>> BEAM-2194  for
>> supporting cross join.
>>
>> Looking forward to comments!
>>
>> Best,
>> Kai
>>
>> ᐧ
>>
>


Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Lukasz Cwik
In that case my confusion lies with how or why adding support for an
iterator helps Samza resolve its memory issues and the consistency
questions are about how a Runner is choosing to parallelize execution of a
straight-line program. For example a runner may choose to use optimistic
locking and actually process two concurrent bundles at the same time for
the same key and window. If at most one of them does a write, both bundles
can be successfully committed. If Samza does this then it may not be able
to free snapshots within a bundle because it may not be able to emulate
straight-line program order consistency. If it always processes key+window
pairs serially, then I'm not sure why Samza needs to use a snapshot in the
first place and should be able to read data from RocksDb directly.

The other point of confusion is around how is a readIterator() any
different then returning a wrapper that just invokes read().iterator() on a
BagState. If that is the case then this is a trivial change which doesn't
need to change the portability design. The nuance is likely that the
interface between what a Runner implements to provide support for user
state may give itself more visibility into what is going on then what the
portability framework can provide as is.

On Tue, May 15, 2018 at 8:57 AM Kenneth Knowles  wrote:

> OK, got it. But what consistency are you referring to? I was trying to
> point out that there's nothing but straight-line program order consistency.
> There's only one actor doing all the reads and all the writes.
>
> Kenn
>
> On Tue, May 15, 2018 at 8:39 AM Lukasz Cwik  wrote:
>
>> I misspoke when I said portability semantics and should have said
>> portability design/implementation. This is why I had a follow-up e-mail and
>> clarified that I'm confused on:
>> * I don't understand how you would want close to change the semantics of
>> a user state specification and how it affects the lifetime of user state?
>> ** Does it represent committing information within a bundle?
>> ** Does it mean that user state can ignore the replayable and consistent
>> semantics for a lifetime of a bundle semantics?
>>
>> I'm trying to tie back what does a `ReadableState
>> readIterator()` means for Runner authors and how it solves the
>> memory/close() problem for Samza. Based upon
>> https://s.apache.org/beam-state, reading and writing of state must be
>> consistent. Does this mean that Samza must use snapshots for the lifetime
>> of a bundle? If so, I don't see how adding a `ReadableState
>> readIterator()` allows Samza to ignore the consistency requirement and be
>> allowed to free snapshots.
>>
>> It might be worthwhile to setup a three way hangouts call to help me as I
>> don't have the same level of context which can be shared back to this
>> thread. Xinyu / Kenn, how about we setup a time using Slack?
>>
>> On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles  wrote:
>>
>>> I feel like this discussion is kind of far from the primary intention.
>>> The point of ParDo(stateful DoFn) is to enable naive single-threaded code
>>> in a style intuitive to a beginning imperative programmer. So:
>>>
>>>  - the return value of read() should act like an immutable value
>>>  - if there is a read after a write, that read should reflect the
>>> changes written, because there's a trivial happens-before relationship
>>> induced by program order
>>>  - there are no non-trivial happens-before relationships
>>>  - a write after a read should not affect the value read before
>>>
>>> From that starting point, the limitations on expressiveness (single
>>> threaded per-key-and-window-and-step) give rise to embarrassingly parallel
>>> computation and GC.
>>>
>>> So when you say "portability semantics" it makes me concerned. The word
>>> "semantics" refers to the mapping between what a user writes and what it
>>> means. Primarily, that means the conceptual translation between a primitive
>>> PTransform and the corresponding PCollection-to-PCollection operation. It
>>> is true that portability complicates things because a user's code can only
>>> be given meaning relative to an SDK harness. But the end-user intention is
>>> unchanged. If we had time and resources to give the Fn API a solid spec, it
>>> should be such that the SDK harness has little choice but to implement the
>>> primitives as intended.
>>>
>>> In other words, it is the job of
>>> https://s.apache.org/beam-fn-state-api-and-bundle-processing* to
>>> implement https://s.apache.org/beam-state. The discussion of adding
>>> `ReadableState readIterator()` to BagState seems consistent with
>>> the latter.
>>>
>>> Kenn
>>>
>>> *IIUC you've chosen to use the same underlying proto service for side
>>> inputs and by-reference values. That's an implementation detail that I have
>>> no particular opinion about except that if it complicates implementing the
>>> primary motivator for state, it may not be a great fit.
>>>
>>>
>>>
>>> On Mon, May 14, 2018 at 7:35 PM Kenneth Knowles  wrote:
>>>
>>>

Re: Survey: what is everyone working on that you want to share?

2018-05-15 Thread David Morávek
Hi Kenn,

Java 8 DSL

JIRA:

dsl-euphoria

/ BEAM-3900 

Feature branch: dsl-euphoria


Contact: David Moravek 

Description: Java 8 wrapper over the Beam Java SDK, based on Euphoria API
 project.


Thanks,

David



On Tue, May 15, 2018 at 7:27 AM, Kenneth Knowles  wrote:

> Hi all,
>
> TL;DR: for anyone who is willing & able to answer: What big-picture thing
> are you working on? I want to highlight it here:
>
> https://beam.apache.org/contribute/#works-in-progress
>
> (I want each of these to have a nice description, too! And a saved JIRA
> search for starter tasks would be clever...)
>
>  context 
>
> Following feedback from the Beam summit and other discussions, I've been
> working with Melissa and looping in folks to revamp the website to have a
> more welcoming tone and to make it easier for newcomers to get started.
>
> Part of that is making the site and guide more concise and making the
> entry points more prominent and interesting. Starter tasks are OK, but to
> get a lasting engagement the idea is for starter tasks connect a newcomer
> with ongoing projects. Most importantly, they are more likely to have
> exciting interactions with experienced contributors, and to have follow-up
> work to the starter task.
>
> Kenn
>


Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Kenneth Knowles
On Tue, May 15, 2018 at 9:36 AM Lukasz Cwik  wrote:

> If it always processes key+window pairs serially, then I'm not sure why
> Samza needs to use a snapshot in the first place and should be able to read
> data from RocksDb directly.
>
> The other point of confusion is around how is a readIterator() any
> different then returning a wrapper that just invokes read().iterator() on a
> BagState. If that is the case then this is a trivial change which doesn't
> need to change the portability design.
>

I think this is a key point. Just calling out the obvious to talk about it:

 - read() returns an iterable - a value with fixed contents that can be
iterated many times
 - readIterator() would return an iterator - a value with fixed contents
that can be iterated only once

In both cases, later writes should not affect the observed contents. So
they both need some kind of snapshot for their lifetime. With
read().iterator() the intermediate iterable is immediately garbage, but
this is the reference queue based solution again, with some ref tracking
for iterables/iterators.

I now see what you mean about committing within a bundle; we definitely
need to atomically commit the completion of upstream elements, output
elements, and state modifications. What are your thoughts on this Xinyu?

Happy to join a quick hangout.

Kenn


The nuance is likely that the interface between what a Runner implements to
> provide support for user state may give itself more visibility into what is
> going on then what the portability framework can provide as is.
>
> On Tue, May 15, 2018 at 8:57 AM Kenneth Knowles  wrote:
>
>> OK, got it. But what consistency are you referring to? I was trying to
>> point out that there's nothing but straight-line program order consistency.
>> There's only one actor doing all the reads and all the writes.
>>
>> Kenn
>>
>> On Tue, May 15, 2018 at 8:39 AM Lukasz Cwik  wrote:
>>
>>> I misspoke when I said portability semantics and should have said
>>> portability design/implementation. This is why I had a follow-up e-mail and
>>> clarified that I'm confused on:
>>> * I don't understand how you would want close to change the semantics of
>>> a user state specification and how it affects the lifetime of user state?
>>> ** Does it represent committing information within a bundle?
>>> ** Does it mean that user state can ignore the replayable and consistent
>>> semantics for a lifetime of a bundle semantics?
>>>
>>> I'm trying to tie back what does a `ReadableState
>>> readIterator()` means for Runner authors and how it solves the
>>> memory/close() problem for Samza. Based upon
>>> https://s.apache.org/beam-state, reading and writing of state must be
>>> consistent. Does this mean that Samza must use snapshots for the lifetime
>>> of a bundle? If so, I don't see how adding a `ReadableState
>>> readIterator()` allows Samza to ignore the consistency requirement and be
>>> allowed to free snapshots.
>>>
>>> It might be worthwhile to setup a three way hangouts call to help me as
>>> I don't have the same level of context which can be shared back to this
>>> thread. Xinyu / Kenn, how about we setup a time using Slack?
>>>
>>> On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles  wrote:
>>>
 I feel like this discussion is kind of far from the primary intention.
 The point of ParDo(stateful DoFn) is to enable naive single-threaded code
 in a style intuitive to a beginning imperative programmer. So:

  - the return value of read() should act like an immutable value
  - if there is a read after a write, that read should reflect the
 changes written, because there's a trivial happens-before relationship
 induced by program order
  - there are no non-trivial happens-before relationships
  - a write after a read should not affect the value read before

 From that starting point, the limitations on expressiveness (single
 threaded per-key-and-window-and-step) give rise to embarrassingly parallel
 computation and GC.

 So when you say "portability semantics" it makes me concerned. The word
 "semantics" refers to the mapping between what a user writes and what it
 means. Primarily, that means the conceptual translation between a primitive
 PTransform and the corresponding PCollection-to-PCollection operation. It
 is true that portability complicates things because a user's code can only
 be given meaning relative to an SDK harness. But the end-user intention is
 unchanged. If we had time and resources to give the Fn API a solid spec, it
 should be such that the SDK harness has little choice but to implement the
 primitives as intended.

 In other words, it is the job of
 https://s.apache.org/beam-fn-state-api-and-bundle-processing* to
 implement https://s.apache.org/beam-state. The discussion of adding
 `ReadableState readIterator()` to BagState seems consistent with
 the latter.

 Kenn

 *IIUC yo

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Xinyu Liu
For Samza runner, it's always processes key+window pairs serially. To
answer Luke's question:

- Why Samza needs to use a snapshot in the first place and should be able
to read data from RocksDb directly?
I believe that the point of adding the readIterator() is to allow us to
read from RocksDb directly without bulk loading to memory. In this case
Samza will create the RocksDb iterator and expose it through the interface
to the user. There is no need to create an explicit snapshot for this. For
read() we need to load content into memory so we can release the rocksDb
resources immediately and the user can iterate many times later. The reason
I mentioned about snapshot is that if we don't load into memory for read(),
we need to maintain rocskDb snapshot under the hood so the user can iterate
the same content. This will cause memory and performance issues for our
users.

For the commit question, do you mean whether we commit in the middle of
processing a bundle? That's not the case in Samza. Samza will wait for the
bundle process complete, flush the output and state modifications and then
checkpointing.

Thanks,
Xinyu



On Tue, May 15, 2018 at 10:34 AM, Kenneth Knowles  wrote:

>
>
> On Tue, May 15, 2018 at 9:36 AM Lukasz Cwik  wrote:
>
>> If it always processes key+window pairs serially, then I'm not sure why
>> Samza needs to use a snapshot in the first place and should be able to read
>> data from RocksDb directly.
>>
>> The other point of confusion is around how is a readIterator() any
>> different then returning a wrapper that just invokes read().iterator() on a
>> BagState. If that is the case then this is a trivial change which doesn't
>> need to change the portability design.
>>
>
> I think this is a key point. Just calling out the obvious to talk about it:
>
>  - read() returns an iterable - a value with fixed contents that can be
> iterated many times
>  - readIterator() would return an iterator - a value with fixed contents
> that can be iterated only once
>
> In both cases, later writes should not affect the observed contents. So
> they both need some kind of snapshot for their lifetime. With
> read().iterator() the intermediate iterable is immediately garbage, but
> this is the reference queue based solution again, with some ref tracking
> for iterables/iterators.
>
> I now see what you mean about committing within a bundle; we definitely
> need to atomically commit the completion of upstream elements, output
> elements, and state modifications. What are your thoughts on this Xinyu?
>
> Happy to join a quick hangout.
>
> Kenn
>
>
> The nuance is likely that the interface between what a Runner implements
>> to provide support for user state may give itself more visibility into what
>> is going on then what the portability framework can provide as is.
>>
>> On Tue, May 15, 2018 at 8:57 AM Kenneth Knowles  wrote:
>>
>>> OK, got it. But what consistency are you referring to? I was trying to
>>> point out that there's nothing but straight-line program order consistency.
>>> There's only one actor doing all the reads and all the writes.
>>>
>>> Kenn
>>>
>>> On Tue, May 15, 2018 at 8:39 AM Lukasz Cwik  wrote:
>>>
 I misspoke when I said portability semantics and should have said
 portability design/implementation. This is why I had a follow-up e-mail and
 clarified that I'm confused on:
 * I don't understand how you would want close to change the semantics
 of a user state specification and how it affects the lifetime of user 
 state?
 ** Does it represent committing information within a bundle?
 ** Does it mean that user state can ignore the replayable and
 consistent semantics for a lifetime of a bundle semantics?

 I'm trying to tie back what does a `ReadableState
 readIterator()` means for Runner authors and how it solves the
 memory/close() problem for Samza. Based upon https://s.apache.org/
 beam-state, reading and writing of state must be consistent. Does this
 mean that Samza must use snapshots for the lifetime of a bundle? If so, I
 don't see how adding a `ReadableState readIterator()` allows
 Samza to ignore the consistency requirement and be allowed to free
 snapshots.

 It might be worthwhile to setup a three way hangouts call to help me as
 I don't have the same level of context which can be shared back to this
 thread. Xinyu / Kenn, how about we setup a time using Slack?

 On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles  wrote:

> I feel like this discussion is kind of far from the primary intention.
> The point of ParDo(stateful DoFn) is to enable naive single-threaded code
> in a style intuitive to a beginning imperative programmer. So:
>
>  - the return value of read() should act like an immutable value
>  - if there is a read after a write, that read should reflect the
> changes written, because there's a trivial happens-before relationship
> induced by pr

Re: Support close of the iterator/iterable created from MapState/SetState

2018-05-15 Thread Xinyu Liu
I am happy to chat about it over hangout or slack too. Let's talk offline
to set it up if needed.

Thanks,
Xinyu

On Tue, May 15, 2018 at 10:51 AM, Xinyu Liu  wrote:

> For Samza runner, it's always processes key+window pairs serially. To
> answer Luke's question:
>
> - Why Samza needs to use a snapshot in the first place and should be able
> to read data from RocksDb directly?
> I believe that the point of adding the readIterator() is to allow us to
> read from RocksDb directly without bulk loading to memory. In this case
> Samza will create the RocksDb iterator and expose it through the interface
> to the user. There is no need to create an explicit snapshot for this. For
> read() we need to load content into memory so we can release the rocksDb
> resources immediately and the user can iterate many times later. The reason
> I mentioned about snapshot is that if we don't load into memory for read(),
> we need to maintain rocskDb snapshot under the hood so the user can iterate
> the same content. This will cause memory and performance issues for our
> users.
>
> For the commit question, do you mean whether we commit in the middle of
> processing a bundle? That's not the case in Samza. Samza will wait for the
> bundle process complete, flush the output and state modifications and then
> checkpointing.
>
> Thanks,
> Xinyu
>
>
>
> On Tue, May 15, 2018 at 10:34 AM, Kenneth Knowles  wrote:
>
>>
>>
>> On Tue, May 15, 2018 at 9:36 AM Lukasz Cwik  wrote:
>>
>>> If it always processes key+window pairs serially, then I'm not sure why
>>> Samza needs to use a snapshot in the first place and should be able to read
>>> data from RocksDb directly.
>>>
>>> The other point of confusion is around how is a readIterator() any
>>> different then returning a wrapper that just invokes read().iterator() on a
>>> BagState. If that is the case then this is a trivial change which doesn't
>>> need to change the portability design.
>>>
>>
>> I think this is a key point. Just calling out the obvious to talk about
>> it:
>>
>>  - read() returns an iterable - a value with fixed contents that can be
>> iterated many times
>>  - readIterator() would return an iterator - a value with fixed contents
>> that can be iterated only once
>>
>> In both cases, later writes should not affect the observed contents. So
>> they both need some kind of snapshot for their lifetime. With
>> read().iterator() the intermediate iterable is immediately garbage, but
>> this is the reference queue based solution again, with some ref tracking
>> for iterables/iterators.
>>
>> I now see what you mean about committing within a bundle; we definitely
>> need to atomically commit the completion of upstream elements, output
>> elements, and state modifications. What are your thoughts on this Xinyu?
>>
>> Happy to join a quick hangout.
>>
>> Kenn
>>
>>
>> The nuance is likely that the interface between what a Runner implements
>>> to provide support for user state may give itself more visibility into what
>>> is going on then what the portability framework can provide as is.
>>>
>>> On Tue, May 15, 2018 at 8:57 AM Kenneth Knowles  wrote:
>>>
 OK, got it. But what consistency are you referring to? I was trying to
 point out that there's nothing but straight-line program order consistency.
 There's only one actor doing all the reads and all the writes.

 Kenn

 On Tue, May 15, 2018 at 8:39 AM Lukasz Cwik  wrote:

> I misspoke when I said portability semantics and should have said
> portability design/implementation. This is why I had a follow-up e-mail 
> and
> clarified that I'm confused on:
> * I don't understand how you would want close to change the semantics
> of a user state specification and how it affects the lifetime of user 
> state?
> ** Does it represent committing information within a bundle?
> ** Does it mean that user state can ignore the replayable and
> consistent semantics for a lifetime of a bundle semantics?
>
> I'm trying to tie back what does a `ReadableState
> readIterator()` means for Runner authors and how it solves the
> memory/close() problem for Samza. Based upon https://s.apache.org/beam
> -state, reading and writing of state must be consistent. Does this
> mean that Samza must use snapshots for the lifetime of a bundle? If so, I
> don't see how adding a `ReadableState readIterator()` allows
> Samza to ignore the consistency requirement and be allowed to free
> snapshots.
>
> It might be worthwhile to setup a three way hangouts call to help me
> as I don't have the same level of context which can be shared back to this
> thread. Xinyu / Kenn, how about we setup a time using Slack?
>
> On Mon, May 14, 2018 at 8:36 PM Kenneth Knowles 
> wrote:
>
>> I feel like this discussion is kind of far from the primary
>> intention. The point of ParDo(stateful DoFn) is to enable naive
>> single-threaded cod

Re: Looking for contributors for Python 3 support

2018-05-15 Thread Valentyn Tymofieiev
Thanks, Robbe!

I also sent https://github.com/apache/beam-site/pull/441 to cover this
information on Beam Site for better visibility.

On Fri, May 11, 2018 at 6:36 AM, Robbe Sneyders 
wrote:

> Hello everyone,
>
> We have started adding Python 3 support to Beam. It took a while to get
> the best approach sorted out, but the first PR [1] has been merged and
> we're ready to start working on additional subpackages in parallel. We
> would like to prevent regression as much as possible, so any help to speed
> up the process is appreciated!
>
> A Kanban board with the open issues can be found at [2]. If you're
> interested to help, you can select a subpackage to port and assign yourself
> the corresponding issue (or comment on the issue if you cannot assign
> yourself).
>
> The used approach is documented in [3] . We are currently working on step
> 2.
>
> When submitting a new PR, you can tag me (@RobbeSneyders), @aaltay,
> and @tvalentyn.
>
> Thanks!
> Robbe
>
> [1] https://github.com/apache/beam/pull/5053
> [2] https://issues.apache.org/jira/secure/RapidBoard.jspa?
> rapidView=245&view=detail&selectedIssue=BEAM-3058
> [3] https://docs.google.com/document/d/1xDG0MWVlDKDPu_
> IW9gtMvxi2S9I0GB0VDTkPhjXT0nE
> 
>
> --
>
> [image: https://ml6.eu] 
>
> * Robbe Sneyders*
>
> ML6 Gent
> 
>
> M: +32 474 71 31 08
>


Re: Survey: what is everyone working on that you want to share?

2018-05-15 Thread Valentyn Tymofieiev
Hi Kenn,

I sent https://github.com/apache/beam-site/pull/441 to cover efforts
related to Python 3 support in Beam.

Thanks,
Valentyn

On Tue, May 15, 2018 at 10:27 AM, David Morávek 
wrote:

> Hi Kenn,
>
> Java 8 DSL
>
> JIRA:
> 
> dsl-euphoria
> 
> / BEAM-3900 
>
> Feature branch: dsl-euphoria
> 
>
> Contact: David Moravek 
>
> Description: Java 8 wrapper over the Beam Java SDK, based on Euphoria API
>  project.
>
>
> Thanks,
>
> David
>
>
>
> On Tue, May 15, 2018 at 7:27 AM, Kenneth Knowles  wrote:
>
>> Hi all,
>>
>> TL;DR: for anyone who is willing & able to answer: What big-picture thing
>> are you working on? I want to highlight it here:
>>
>> https://beam.apache.org/contribute/#works-in-progress
>>
>> (I want each of these to have a nice description, too! And a saved JIRA
>> search for starter tasks would be clever...)
>>
>>  context 
>>
>> Following feedback from the Beam summit and other discussions, I've been
>> working with Melissa and looping in folks to revamp the website to have a
>> more welcoming tone and to make it easier for newcomers to get started.
>>
>> Part of that is making the site and guide more concise and making the
>> entry points more prominent and interesting. Starter tasks are OK, but to
>> get a lasting engagement the idea is for starter tasks connect a newcomer
>> with ongoing projects. Most importantly, they are more likely to have
>> exciting interactions with experienced contributors, and to have follow-up
>> work to the starter task.
>>
>> Kenn
>>
>
>


Re: Survey: what is everyone working on that you want to share?

2018-05-15 Thread Kenneth Knowles
I wanted to bring back to this thread that, yes, I would most like pull
requests :-)

And please don't feel like you have to follow the pattern already there. It
would be better if each project had a sentence describing it (be concise,
still) and the whole content fit right in the page.

On Tue, May 15, 2018 at 2:48 PM Valentyn Tymofieiev 
wrote:

> Hi Kenn,
>
> I sent https://github.com/apache/beam-site/pull/441 to cover efforts
> related to Python 3 support in Beam.
>
> Thanks,
> Valentyn
>
> On Tue, May 15, 2018 at 10:27 AM, David Morávek 
> wrote:
>
>> Hi Kenn,
>>
>> Java 8 DSL
>>
>> JIRA:
>> 
>> dsl-euphoria
>> 
>> / BEAM-3900 
>>
>> Feature branch: dsl-euphoria
>> 
>>
>> Contact: David Moravek 
>>
>> Description: Java 8 wrapper over the Beam Java SDK, based on Euphoria API
>>  project.
>>
>>
>> Thanks,
>>
>> David
>>
>>
>>
>> On Tue, May 15, 2018 at 7:27 AM, Kenneth Knowles  wrote:
>>
>>> Hi all,
>>>
>>> TL;DR: for anyone who is willing & able to answer: What big-picture
>>> thing are you working on? I want to highlight it here:
>>>
>>> https://beam.apache.org/contribute/#works-in-progress
>>>
>>> (I want each of these to have a nice description, too! And a saved JIRA
>>> search for starter tasks would be clever...)
>>>
>>>  context 
>>>
>>> Following feedback from the Beam summit and other discussions, I've been
>>> working with Melissa and looping in folks to revamp the website to have a
>>> more welcoming tone and to make it easier for newcomers to get started.
>>>
>>> Part of that is making the site and guide more concise and making the
>>> entry points more prominent and interesting. Starter tasks are OK, but to
>>> get a lasting engagement the idea is for starter tasks connect a newcomer
>>> with ongoing projects. Most importantly, they are more likely to have
>>> exciting interactions with experienced contributors, and to have follow-up
>>> work to the starter task.
>>>
>>> Kenn
>>>
>>
>>
>


Java sdk tests are broken

2018-05-15 Thread Boyuan Zhang
Hey all,

There are several Java tests failure now. Would anyone like taking a look
at them? I collected 2 build failure links here:

   - https://scans.gradle.com/s/7dw7t34jxtqnm/
   - https://scans.gradle.com/s/xialvtd3wqu6u

Thanks so much for your attention!

Boyuan


Re: Java sdk tests are broken

2018-05-15 Thread Lukasz Cwik
I believe Thomas Groh reverted the offending PR with
https://github.com/apache/beam/commit/72b6c8f6d0923da641b5344eb6a1fa8c260d596c

On Tue, May 15, 2018 at 3:41 PM Boyuan Zhang  wrote:

> Hey all,
>
> There are several Java tests failure now. Would anyone like taking a look
> at them? I collected 2 build failure links here:
>
>- https://scans.gradle.com/s/7dw7t34jxtqnm/
>- https://scans.gradle.com/s/xialvtd3wqu6u
>
> Thanks so much for your attention!
>
> Boyuan
>


Fwd: Missing dep in python sdk? "ImportError: No module named builtins"

2018-05-15 Thread Pablo Estrada
I believe this is a consequence of the recent work to futurize Python
code[1]. I believe Robbe and Valentyn are working on this effort.

We might just need to add the dependency to setup.py?

[1]
https://github.com/apache/beam/commit/cc8bc3fd88d7989cf28ab062117fede102e35bed#diff-7fc98ec2b3fade7eded997b58710422c

On Tue, May 15, 2018 at 3:52 PM Alex Amato  wrote:

> FYI
>
> Running from a clean git branch in a virtualenv.
> python setup.py test -s apache_beam.runners.portability.fun_api_runner_test
>
> I guess people have been installing these on their machine's normally.
> Ideally this should be added as a dep in setup.py. I will add the latest
> version and see if I can get it to build properly that way.
>
> Traceback (most recent call last):
>   File "setup.py", line 208, in 
> 'test': generate_protos_first(test),
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/__init__.py",
> line 129, in setup
> return distutils.core.setup(**attrs)
>   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
> dist.run_commands()
>   File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
> self.run_command(cmd)
>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
> cmd_obj.run()
>   File "setup.py", line 143, in run
> super(cmd, self).run()
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
> line 226, in run
> self.run_tests()
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
> line 248, in run_tests
> exit=False,
>   File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
> self.parseArgs(argv)
>   File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
> self.createTests()
>   File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
> self.module)
>   File "/usr/lib/python2.7/unittest/loader.py", line 130, in
> loadTestsFromNames
> suites = [self.loadTestsFromName(name, module) for name in names]
>   File "/usr/lib/python2.7/unittest/loader.py", line 91, in
> loadTestsFromName
> module = __import__('.'.join(parts_copy))
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/__init__.py",
> line 86, in 
> from apache_beam import coders
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/__init__.py",
> line 19, in 
> from apache_beam.coders.coders import *
>   File
> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/coders.py",
> line 25, in 
> from builtins import object
> ImportError: No module named builtins
>
> --
Got feedback? go/pabloem-feedback


Missing pypi dependency? "ImportError: No module named builtins"

2018-05-15 Thread Alex Amato
FYI, I encountered this error, which failed to import a module, perhaps
some developers have installed this on their machine? Was this a recent
change, should a dependency be added to setup.py?

Running from a clean git branch in a virtualenv.
python setup.py test -s apache_beam.runners.portability.fun_api_runner_test

Traceback (most recent call last):
  File "setup.py", line 208, in 
'test': generate_protos_first(test),
  File
"/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/__init__.py",
line 129, in setup
return distutils.core.setup(**attrs)
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
  File "setup.py", line 143, in run
super(cmd, self).run()
  File
"/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
line 226, in run
self.run_tests()
  File
"/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
line 248, in run_tests
exit=False,
  File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
self.parseArgs(argv)
  File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
self.createTests()
  File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
self.module)
  File "/usr/lib/python2.7/unittest/loader.py", line 130, in
loadTestsFromNames
suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python2.7/unittest/loader.py", line 91, in
loadTestsFromName
module = __import__('.'.join(parts_copy))
  File
"/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/__init__.py",
line 86, in 
from apache_beam import coders
  File
"/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/__init__.py",
line 19, in 
from apache_beam.coders.coders import *
  File
"/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/coders.py",
line 25, in 
from builtins import object
ImportError: No module named builtins


Re: Missing dep in python sdk? "ImportError: No module named builtins"

2018-05-15 Thread Valentyn Tymofieiev
Adding dependency on future in the setup.py fixes this for me, sending
https://github.com/apache/beam/pull/5379.

On Tue, May 15, 2018 at 3:58 PM, Pablo Estrada  wrote:

> I believe this is a consequence of the recent work to futurize Python
> code[1]. I believe Robbe and Valentyn are working on this effort.
>
> We might just need to add the dependency to setup.py?
>
> [1] https://github.com/apache/beam/commit/cc8bc3fd88d7989cf28ab062117fed
> e102e35bed#diff-7fc98ec2b3fade7eded997b58710422c
> On Tue, May 15, 2018 at 3:52 PM Alex Amato  wrote:
>
>> FYI
>>
>> Running from a clean git branch in a virtualenv.
>> python setup.py test -s apache_beam.runners.portability.fun_api_runner_
>> test
>>
>> I guess people have been installing these on their machine's normally.
>> Ideally this should be added as a dep in setup.py. I will add the latest
>> version and see if I can get it to build properly that way.
>>
>> Traceback (most recent call last):
>>   File "setup.py", line 208, in 
>> 'test': generate_protos_first(test),
>>   File "/usr/local/google/home/ajamato/beam_git/beam/venv/
>> local/lib/python2.7/site-packages/setuptools/__init__.py", line 129, in
>> setup
>> return distutils.core.setup(**attrs)
>>   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
>> dist.run_commands()
>>   File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
>> self.run_command(cmd)
>>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
>> cmd_obj.run()
>>   File "setup.py", line 143, in run
>> super(cmd, self).run()
>>   File "/usr/local/google/home/ajamato/beam_git/beam/venv/
>> local/lib/python2.7/site-packages/setuptools/command/test.py", line 226,
>> in run
>> self.run_tests()
>>   File "/usr/local/google/home/ajamato/beam_git/beam/venv/
>> local/lib/python2.7/site-packages/setuptools/command/test.py", line 248,
>> in run_tests
>> exit=False,
>>   File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
>> self.parseArgs(argv)
>>   File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
>> self.createTests()
>>   File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
>> self.module)
>>   File "/usr/lib/python2.7/unittest/loader.py", line 130, in
>> loadTestsFromNames
>> suites = [self.loadTestsFromName(name, module) for name in names]
>>   File "/usr/lib/python2.7/unittest/loader.py", line 91, in
>> loadTestsFromName
>> module = __import__('.'.join(parts_copy))
>>   File "/usr/local/google/home/ajamato/beam_git/beam/sdks/
>> python/apache_beam/__init__.py", line 86, in 
>> from apache_beam import coders
>>   File "/usr/local/google/home/ajamato/beam_git/beam/sdks/
>> python/apache_beam/coders/__init__.py", line 19, in 
>> from apache_beam.coders.coders import *
>>   File "/usr/local/google/home/ajamato/beam_git/beam/sdks/
>> python/apache_beam/coders/coders.py", line 25, in 
>> from builtins import object
>> ImportError: No module named builtins
>>
>> --
> Got feedback? go/pabloem-feedback
> 
>


Re: Missing dep in python sdk? "ImportError: No module named builtins"

2018-05-15 Thread Pablo Estrada
That makes sense. Thanks a lot Valentyn!

On Tue, May 15, 2018 at 5:25 PM Valentyn Tymofieiev 
wrote:

> Adding dependency on future in the setup.py fixes this for me, sending
> https://github.com/apache/beam/pull/5379.
>
> On Tue, May 15, 2018 at 3:58 PM, Pablo Estrada  wrote:
>
>> I believe this is a consequence of the recent work to futurize Python
>> code[1]. I believe Robbe and Valentyn are working on this effort.
>>
>> We might just need to add the dependency to setup.py?
>>
>> [1]
>> https://github.com/apache/beam/commit/cc8bc3fd88d7989cf28ab062117fede102e35bed#diff-7fc98ec2b3fade7eded997b58710422c
>>
>> On Tue, May 15, 2018 at 3:52 PM Alex Amato  wrote:
>>
>>> FYI
>>>
>>> Running from a clean git branch in a virtualenv.
>>> python setup.py test -s
>>> apache_beam.runners.portability.fun_api_runner_test
>>>
>>> I guess people have been installing these on their machine's normally.
>>> Ideally this should be added as a dep in setup.py. I will add the latest
>>> version and see if I can get it to build properly that way.
>>>
>>> Traceback (most recent call last):
>>>   File "setup.py", line 208, in 
>>> 'test': generate_protos_first(test),
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/__init__.py",
>>> line 129, in setup
>>> return distutils.core.setup(**attrs)
>>>   File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
>>> dist.run_commands()
>>>   File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
>>> self.run_command(cmd)
>>>   File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
>>> cmd_obj.run()
>>>   File "setup.py", line 143, in run
>>> super(cmd, self).run()
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
>>> line 226, in run
>>> self.run_tests()
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/venv/local/lib/python2.7/site-packages/setuptools/command/test.py",
>>> line 248, in run_tests
>>> exit=False,
>>>   File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
>>> self.parseArgs(argv)
>>>   File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
>>> self.createTests()
>>>   File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
>>> self.module)
>>>   File "/usr/lib/python2.7/unittest/loader.py", line 130, in
>>> loadTestsFromNames
>>> suites = [self.loadTestsFromName(name, module) for name in names]
>>>   File "/usr/lib/python2.7/unittest/loader.py", line 91, in
>>> loadTestsFromName
>>> module = __import__('.'.join(parts_copy))
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/__init__.py",
>>> line 86, in 
>>> from apache_beam import coders
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/__init__.py",
>>> line 19, in 
>>> from apache_beam.coders.coders import *
>>>   File
>>> "/usr/local/google/home/ajamato/beam_git/beam/sdks/python/apache_beam/coders/coders.py",
>>> line 25, in 
>>> from builtins import object
>>> ImportError: No module named builtins
>>>
>>> --
>> Got feedback? go/pabloem-feedback
>> 
>>
>
> --
Got feedback? go/pabloem-feedback