[python SDK] Returning Pub/Sub message_id and timestamp

2019-07-12 Thread Matthew Darwin
Good morning,

I'm very new to Beam, and pretty new to Python so please first accept my 
apologies for any obvious misconceptions/mistakes in the following.

I am currently trying to develop a sample pipeline in Python to pull messages 
from Pub/Sub and then write them to either files in cloud storage or to 
BigQuery. The ultimate goal will be to utilise the pipeline for real time 
streaming of event data to BigQuery (with various transformations) but also to 
store the raw messages long term in files in cloud storage.

At the moment, I'm simply trying to parse the message to get the PubSub 
messageId and publishTime in order to be able to write them into the output. 
The json of my PubSub message looks like this:-

[
  {
"ackId": 
"BCEhPjA-RVNEUAYWLF1GSFE3GQhoUQ5PXiM_NSAoRRIICBQFfH1xU1t1Xl8aB1ENGXJ8Zyc_XxcIB0BTeFVaEQx6bVxXOFcMEHF8YXZpWhUIA0FTfXeq5cveluzJNksxIbvE8KxfeqqmgfhiZho9XxJLLD5-PT5FQV5AEkw2C0RJUytDCypYEU4",
"message": {
  "attributes": {
"source": "python"
  },
  "data": "eyJyb3dudW1iZXIiOiAyfQ==",
  "messageId": "619310330691403",
  "publishTime": "2019-07-12T08:27:58.522Z"
}
  }
]
According to the 
documentation
 the PubSub message payload returns the data and attributes properties; is 
there simply no way of retrieving the messageId and publishTime, or are these 
exposed somewhere else? If not, will the inclusion of these be in the roadmap, 
and are they available if using Java (I have zero Java experience hence why 
reaching for Python first).

Kind regards,

Matthew

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast Ltd, an innovator in Software as a Service 
(SaaS) for business. Providing a safer and more useful place for your human 
generated data. Specializing in; Security, archiving and compliance. To find 
out more visit the Mimecast website.


Re: [Java] TextIO not reading file as expected

2019-07-12 Thread Shannon Duncan
So as it turns out, it was an STDOUT issue for my logging and not a data
read in. Beam operated just fine but the way I was debugging was causing
the glitches.

Beam is operating as expected now.

On Thu, Jul 11, 2019 at 10:28 PM Kenneth Knowles  wrote:

> Doesn't sound good. TextIO has been around a long time so I'm surprised.
> Would you mind creating a ticket in Jira (
> https://issues.apache.org/jira/projects/BEAM/) and posting some technical
> details, like input/output/code snippets?
>
> Kenn
>
> On Thu, Jul 11, 2019 at 9:45 AM Shannon Duncan 
> wrote:
>
>> I have a file where every line is a record separated by a tab. So a tab
>> delimited file.
>>
>> However as I read this file in using TextIO.read().from(filename) and
>> pass the results to a pardo, the elements are random chunks of the records.
>> I expected the element to be the entire line of text which then I'll do
>> parsing on from there.
>>
>> This file is processed in a python pipeline with ReadFromText perfectly
>> fine. Just curious what would cause this on the Java side?
>>
>> Thanks,
>> Shannon
>>
>


Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
So I have my custom coder created for TreeMap and I'm ready to set it...

So my Type is "TreeMap>"

What do I put for ".setCoder(TreeMapCoder.of(???, ???))"

On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:

> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>
> Rui
>
> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan 
> wrote:
>
>> Was able to get it to use ArrayList by doing List> result =
>> new ArrayList>();
>>
>> Then storing my keys in a separate array that I'll pass in as a side
>> input to key for the list of lists.
>>
>> Thanks for the help, lemme know more in the future about how coders work
>> and instantiate and I'd love to help contribute by adding some new coders.
>>
>> - Shannon
>>
>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Will do. Thanks. A new coder for deterministic Maps would be great in
>>> the future. Thank you!
>>>
>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:
>>>
 I think Mike refers to ListCoder
 
  which
 is deterministic if its element is the same. Maybe you can search the repo
 for examples of ListCoder?


 -Rui

 On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> So ArrayList doesn't work either, so just a standard List?
>
> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang  wrote:
>
>> Shannon, I agree with Mike on List is a good workaround if your
>> element within list is deterministic and you are eager to make your new
>> pipeline working.
>>
>>
>> Let me send back some pointers to adding new coder later.
>>
>>
>> -Rui
>>
>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> I just started learning Java today to attempt to convert our python
>>> pipelines to Java to take advantage of key features that Java has. I 
>>> have
>>> no idea how I would create a new coder and include it in for beam to
>>> recognize.
>>>
>>> If you can point me in the right direction of where it hooks
>>> together I might be able to figure that out. I can duplicate MapCoder 
>>> and
>>> try to make changes, but how will beam know to pick up that coder for a
>>> groupByKey?
>>>
>>> Thanks!
>>> Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang  wrote:
>>>
 It could be just straightforward to create a SortedMapCoder for
 TreeMap. Just add checks on map instances and then change
 verifyDeterministic.

 If this is a common need we could just submit it into Beam repo.

 [1]:
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146

 On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen 
 wrote:

> There isn't a coder for deterministic maps in Beam, so even if
> your datastructure is deterministic, Beam will assume the serialized 
> bytes
> aren't deterministic.
>
> You could make one using the MapCoder as a guide:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
> Just change it such that the exception in VerifyDeterministic is
> removed and when decoding it instantiates a TreeMap or such instead 
> of a
> HashMap.
>
> Alternatively, you could just represent your key as a sorted list
> of KV pairs. Lookups could be done using binary search if necessary.
>
> Mike
>
> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
> joseph.dun...@liveramp.com>:
>
>> So I'm working on essentially doing a word-count on a complex
>> data structure.
>>
>> I tried just using a HashMap as the Structure, but that didn't
>> work because it is non-deterministic.
>>
>> However when Given a LinkedHashMap or TreeMap which is
>> deterministic the SDK complains that it's non-deterministic when 
>> trying to
>> use it as a key for GroupByKey.
>>
>> What would be an appropriate Map style data structure that would
>> be deterministic enough for Apache Beam to accept it as a key?
>>
>> Thanks,
>> Shannon
>>
>


Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));

On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan 
wrote:

> So I have my custom coder created for TreeMap and I'm ready to set it...
>
> So my Type is "TreeMap>"
>
> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>
> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:
>
>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>
>>
>> [1]
>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>
>> Rui
>>
>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Was able to get it to use ArrayList by doing List> result
>>> = new ArrayList>();
>>>
>>> Then storing my keys in a separate array that I'll pass in as a side
>>> input to key for the list of lists.
>>>
>>> Thanks for the help, lemme know more in the future about how coders work
>>> and instantiate and I'd love to help contribute by adding some new coders.
>>>
>>> - Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 Will do. Thanks. A new coder for deterministic Maps would be great in
 the future. Thank you!

 On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:

> I think Mike refers to ListCoder
> 
>  which
> is deterministic if its element is the same. Maybe you can search the repo
> for examples of ListCoder?
>
>
> -Rui
>
> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> So ArrayList doesn't work either, so just a standard List?
>>
>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang  wrote:
>>
>>> Shannon, I agree with Mike on List is a good workaround if your
>>> element within list is deterministic and you are eager to make your new
>>> pipeline working.
>>>
>>>
>>> Let me send back some pointers to adding new coder later.
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 I just started learning Java today to attempt to convert our python
 pipelines to Java to take advantage of key features that Java has. I 
 have
 no idea how I would create a new coder and include it in for beam to
 recognize.

 If you can point me in the right direction of where it hooks
 together I might be able to figure that out. I can duplicate MapCoder 
 and
 try to make changes, but how will beam know to pick up that coder for a
 groupByKey?

 Thanks!
 Shannon

 On Thu, Jul 11, 2019 at 4:42 PM Rui Wang  wrote:

> It could be just straightforward to create a SortedMapCoder for
> TreeMap. Just add checks on map instances and then change
> verifyDeterministic.
>
> If this is a common need we could just submit it into Beam repo.
>
> [1]:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>
> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
> m...@mikepedersen.dk> wrote:
>
>> There isn't a coder for deterministic maps in Beam, so even if
>> your datastructure is deterministic, Beam will assume the serialized 
>> bytes
>> aren't deterministic.
>>
>> You could make one using the MapCoder as a guide:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>> Just change it such that the exception in VerifyDeterministic is
>> removed and when decoding it instantiates a TreeMap or such instead 
>> of a
>> HashMap.
>>
>> Alternatively, you could just represent your key as a sorted list
>> of KV pairs. Lookups could be done using binary search if necessary.
>>
>> Mike
>>
>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>> joseph.dun...@liveramp.com>:
>>
>>> So I'm working on essentially doing a word-count on a complex
>>> data structure.
>>>
>>> I tried just using a HashMap as the Structure, but that didn't
>>> work because it is non-deterministic.
>>>
>>> However when Given a LinkedHashMap or TreeMap which is
>>> deterministic the SDK complains that it's non-deterministic when 
>>> trying to
>>> use it as a key for GroupByKey.
>>>
>>> What would be an appropriate Map style data structure that would
>>> be deterministic enough for Apache Beam to accept it as a key?
>>>
>>> Thanks,
>>

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
Aha, makes sense. Thanks!

On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik  wrote:

> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>
> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> So I have my custom coder created for TreeMap and I'm ready to set it...
>>
>> So my Type is "TreeMap>"
>>
>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>
>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:
>>
>>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>>
>>>
>>> [1]
>>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>>
>>> Rui
>>>
>>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 Was able to get it to use ArrayList by doing List> result
 = new ArrayList>();

 Then storing my keys in a separate array that I'll pass in as a side
 input to key for the list of lists.

 Thanks for the help, lemme know more in the future about how coders
 work and instantiate and I'd love to help contribute by adding some new
 coders.

 - Shannon

 On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> Will do. Thanks. A new coder for deterministic Maps would be great in
> the future. Thank you!
>
> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:
>
>> I think Mike refers to ListCoder
>> 
>>  which
>> is deterministic if its element is the same. Maybe you can search the 
>> repo
>> for examples of ListCoder?
>>
>>
>> -Rui
>>
>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> So ArrayList doesn't work either, so just a standard List?
>>>
>>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang  wrote:
>>>
 Shannon, I agree with Mike on List is a good workaround if your
 element within list is deterministic and you are eager to make your new
 pipeline working.


 Let me send back some pointers to adding new coder later.


 -Rui

 On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> I just started learning Java today to attempt to convert our
> python pipelines to Java to take advantage of key features that Java 
> has. I
> have no idea how I would create a new coder and include it in for 
> beam to
> recognize.
>
> If you can point me in the right direction of where it hooks
> together I might be able to figure that out. I can duplicate MapCoder 
> and
> try to make changes, but how will beam know to pick up that coder for 
> a
> groupByKey?
>
> Thanks!
> Shannon
>
> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang 
> wrote:
>
>> It could be just straightforward to create a SortedMapCoder for
>> TreeMap. Just add checks on map instances and then change
>> verifyDeterministic.
>>
>> If this is a common need we could just submit it into Beam repo.
>>
>> [1]:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>
>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>> m...@mikepedersen.dk> wrote:
>>
>>> There isn't a coder for deterministic maps in Beam, so even if
>>> your datastructure is deterministic, Beam will assume the 
>>> serialized bytes
>>> aren't deterministic.
>>>
>>> You could make one using the MapCoder as a guide:
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
>>> Just change it such that the exception in VerifyDeterministic is
>>> removed and when decoding it instantiates a TreeMap or such instead 
>>> of a
>>> HashMap.
>>>
>>> Alternatively, you could just represent your key as a sorted
>>> list of KV pairs. Lookups could be done using binary search if 
>>> necessary.
>>>
>>> Mike
>>>
>>> Den tor. 11. jul. 2019 kl. 22.41 skrev Shannon Duncan <
>>> joseph.dun...@liveramp.com>:
>>>
 So I'm working on essentially doing a word-count on a complex
 data structure.

 I tried just using a HashMap as the Structure, but that didn't
 work because it is non-deterministic.

 However when Given a LinkedHashMap or TreeMap which is
 deterministic the SDK complains tha

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
I have a working TreeMapCoder now. Got it all setup and done, and the
GroupByKey is accepting it.

Thanks for all the help. I need to read up more on contributing guidelines
then I'll PR the coder into the SDK. Also willing to write coders for
things such as ArrayList etc if people want them.

On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan 
wrote:

> Aha, makes sense. Thanks!
>
> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik  wrote:
>
>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>
>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> So I have my custom coder created for TreeMap and I'm ready to set it...
>>>
>>> So my Type is "TreeMap>"
>>>
>>> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>>>
>>> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:
>>>
 Hi Shannon,  [1] will be a good start on coder in Java SDK.


 [1]
 https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety

 Rui

 On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> Was able to get it to use ArrayList by doing List>
> result = new ArrayList>();
>
> Then storing my keys in a separate array that I'll pass in as a side
> input to key for the list of lists.
>
> Thanks for the help, lemme know more in the future about how coders
> work and instantiate and I'd love to help contribute by adding some new
> coders.
>
> - Shannon
>
> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> Will do. Thanks. A new coder for deterministic Maps would be great in
>> the future. Thank you!
>>
>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:
>>
>>> I think Mike refers to ListCoder
>>> 
>>>  which
>>> is deterministic if its element is the same. Maybe you can search the 
>>> repo
>>> for examples of ListCoder?
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 So ArrayList doesn't work either, so just a standard List?

 On Thu, Jul 11, 2019 at 4:53 PM Rui Wang  wrote:

> Shannon, I agree with Mike on List is a good workaround if your
> element within list is deterministic and you are eager to make your 
> new
> pipeline working.
>
>
> Let me send back some pointers to adding new coder later.
>
>
> -Rui
>
> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> I just started learning Java today to attempt to convert our
>> python pipelines to Java to take advantage of key features that Java 
>> has. I
>> have no idea how I would create a new coder and include it in for 
>> beam to
>> recognize.
>>
>> If you can point me in the right direction of where it hooks
>> together I might be able to figure that out. I can duplicate 
>> MapCoder and
>> try to make changes, but how will beam know to pick up that coder 
>> for a
>> groupByKey?
>>
>> Thanks!
>> Shannon
>>
>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang 
>> wrote:
>>
>>> It could be just straightforward to create a SortedMapCoder for
>>> TreeMap. Just add checks on map instances and then change
>>> verifyDeterministic.
>>>
>>> If this is a common need we could just submit it into Beam repo.
>>>
>>> [1]:
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146
>>>
>>> On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
>>> m...@mikepedersen.dk> wrote:
>>>
 There isn't a coder for deterministic maps in Beam, so even if
 your datastructure is deterministic, Beam will assume the 
 serialized bytes
 aren't deterministic.

 You could make one using the MapCoder as a guide:
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java
 Just change it such that the exception in VerifyDeterministic
 is removed and when decoding it instantiates a TreeMap or such 
 instead of a
 HashMap.

 Alternatively, you could just represent your key as a sorted
 list of KV pairs. Lookups could be done using binary search if 
 necessary.

 Mike

>>

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Lukasz Cwik
Additional coders would be useful. Note that we usually don't have coders
for specific collection types like ArrayList but prefer to have Coders for
their general counterparts like List, Map, Iterable, 

There has been discussion in the past to make the MapCoder a deterministic
coder when a coder is required to be deterministic. There are a few people
working on schema support within Apache Beam that might be able to provide
guidance (+Reuven Lax  +Brian Hulette
).

On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan 
wrote:

> I have a working TreeMapCoder now. Got it all setup and done, and the
> GroupByKey is accepting it.
>
> Thanks for all the help. I need to read up more on contributing guidelines
> then I'll PR the coder into the SDK. Also willing to write coders for
> things such as ArrayList etc if people want them.
>
> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan 
> wrote:
>
>> Aha, makes sense. Thanks!
>>
>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik  wrote:
>>
>>> TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));
>>>
>>> On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 So I have my custom coder created for TreeMap and I'm ready to set it...

 So my Type is "TreeMap>"

 What do I put for ".setCoder(TreeMapCoder.of(???, ???))"

 On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:

> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>
> Rui
>
> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> Was able to get it to use ArrayList by doing List>
>> result = new ArrayList>();
>>
>> Then storing my keys in a separate array that I'll pass in as a side
>> input to key for the list of lists.
>>
>> Thanks for the help, lemme know more in the future about how coders
>> work and instantiate and I'd love to help contribute by adding some new
>> coders.
>>
>> - Shannon
>>
>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Will do. Thanks. A new coder for deterministic Maps would be great
>>> in the future. Thank you!
>>>
>>> On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:
>>>
 I think Mike refers to ListCoder
 
  which
 is deterministic if its element is the same. Maybe you can search the 
 repo
 for examples of ListCoder?


 -Rui

 On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> So ArrayList doesn't work either, so just a standard List?
>
> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang 
> wrote:
>
>> Shannon, I agree with Mike on List is a good workaround if your
>> element within list is deterministic and you are eager to make your 
>> new
>> pipeline working.
>>
>>
>> Let me send back some pointers to adding new coder later.
>>
>>
>> -Rui
>>
>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> I just started learning Java today to attempt to convert our
>>> python pipelines to Java to take advantage of key features that 
>>> Java has. I
>>> have no idea how I would create a new coder and include it in for 
>>> beam to
>>> recognize.
>>>
>>> If you can point me in the right direction of where it hooks
>>> together I might be able to figure that out. I can duplicate 
>>> MapCoder and
>>> try to make changes, but how will beam know to pick up that coder 
>>> for a
>>> groupByKey?
>>>
>>> Thanks!
>>> Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:42 PM Rui Wang 
>>> wrote:
>>>
 It could be just straightforward to create a SortedMapCoder for
 TreeMap. Just add checks on map instances and then change
 verifyDeterministic.

 If this is a common need we could just submit it into Beam repo.

 [1]:
 https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/MapCoder.java#L146

 On Thu, Jul 11, 2019 at 2:26 PM Mike Pedersen <
 m...@mikepedersen.dk> wrote:

> There isn't a coder for deterministic maps in Beam, so even if
> your datastructure is deterministic, Beam will assume the 
> serialized byte

Re: [Java] Using a complex datastructure as Key for KV

2019-07-12 Thread Shannon Duncan
I tried to pass ArrayList in and it wouldn't generalize it to List. It
required me to convert my ArrayLists  to Lists.

On Fri, Jul 12, 2019 at 10:20 AM Lukasz Cwik  wrote:

> Additional coders would be useful. Note that we usually don't have coders
> for specific collection types like ArrayList but prefer to have Coders for
> their general counterparts like List, Map, Iterable, 
>
> There has been discussion in the past to make the MapCoder a deterministic
> coder when a coder is required to be deterministic. There are a few people
> working on schema support within Apache Beam that might be able to provide
> guidance (+Reuven Lax  +Brian Hulette
> ).
>
> On Fri, Jul 12, 2019 at 11:05 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> I have a working TreeMapCoder now. Got it all setup and done, and the
>> GroupByKey is accepting it.
>>
>> Thanks for all the help. I need to read up more on contributing
>> guidelines then I'll PR the coder into the SDK. Also willing to write
>> coders for things such as ArrayList etc if people want them.
>>
>> On Fri, Jul 12, 2019 at 9:31 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Aha, makes sense. Thanks!
>>>
>>> On Fri, Jul 12, 2019 at 9:26 AM Lukasz Cwik  wrote:
>>>
 TreeMapCoder.of(StringUtf8Coder.of(), ListCoder.of(VarIntCoder.of()));

 On Fri, Jul 12, 2019 at 10:22 AM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:

> So I have my custom coder created for TreeMap and I'm ready to set
> it...
>
> So my Type is "TreeMap>"
>
> What do I put for ".setCoder(TreeMapCoder.of(???, ???))"
>
> On Thu, Jul 11, 2019 at 8:21 PM Rui Wang  wrote:
>
>> Hi Shannon,  [1] will be a good start on coder in Java SDK.
>>
>>
>> [1]
>> https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety
>>
>> Rui
>>
>> On Thu, Jul 11, 2019 at 3:08 PM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> Was able to get it to use ArrayList by doing List>
>>> result = new ArrayList>();
>>>
>>> Then storing my keys in a separate array that I'll pass in as a side
>>> input to key for the list of lists.
>>>
>>> Thanks for the help, lemme know more in the future about how coders
>>> work and instantiate and I'd love to help contribute by adding some new
>>> coders.
>>>
>>> - Shannon
>>>
>>> On Thu, Jul 11, 2019 at 4:59 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 Will do. Thanks. A new coder for deterministic Maps would be great
 in the future. Thank you!

 On Thu, Jul 11, 2019 at 4:58 PM Rui Wang  wrote:

> I think Mike refers to ListCoder
> 
>  which
> is deterministic if its element is the same. Maybe you can search the 
> repo
> for examples of ListCoder?
>
>
> -Rui
>
> On Thu, Jul 11, 2019 at 2:55 PM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> So ArrayList doesn't work either, so just a standard List?
>>
>> On Thu, Jul 11, 2019 at 4:53 PM Rui Wang 
>> wrote:
>>
>>> Shannon, I agree with Mike on List is a good workaround if your
>>> element within list is deterministic and you are eager to make your 
>>> new
>>> pipeline working.
>>>
>>>
>>> Let me send back some pointers to adding new coder later.
>>>
>>>
>>> -Rui
>>>
>>> On Thu, Jul 11, 2019 at 2:45 PM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>>
 I just started learning Java today to attempt to convert our
 python pipelines to Java to take advantage of key features that 
 Java has. I
 have no idea how I would create a new coder and include it in for 
 beam to
 recognize.

 If you can point me in the right direction of where it hooks
 together I might be able to figure that out. I can duplicate 
 MapCoder and
 try to make changes, but how will beam know to pick up that coder 
 for a
 groupByKey?

 Thanks!
 Shannon

 On Thu, Jul 11, 2019 at 4:42 PM Rui Wang 
 wrote:

> It could be just straightforward to create a SortedMapCoder
> for TreeMap. Just add checks on map instances and then change
> verifyDeterministic.
>
> If this is a common need we could just submit it into Beam
> repo.
>
> [1]:
> https://github.com/apache/beam/bl

Re: [python SDK] Returning Pub/Sub message_id and timestamp

2019-07-12 Thread Valentyn Tymofieiev
Hi Matthew,

Welcome to Beam!

Looking at Python PubSub IO API, you should be able to access id and
timestamp by setting `with_attributes=True` when using `ReadFromPubSub`
PTransform, see [1,2].

[1]
https://github.com/apache/beam/blob/0fce2b88660f52dae638697e1472aa108c982ae6/sdks/python/apache_beam/io/gcp/pubsub.py#L61
[2]
https://github.com/apache/beam/blob/0fce2b88660f52dae638697e1472aa108c982ae6/sdks/python/apache_beam/io/gcp/pubsub.py#L138

On Fri, Jul 12, 2019 at 1:36 AM Matthew Darwin <
matthew.dar...@carfinance247.co.uk> wrote:

> Good morning,
>
> I'm very new to Beam, and pretty new to Python so please first accept my
> apologies for any obvious misconceptions/mistakes in the following.
>
> I am currently trying to develop a sample pipeline in Python to pull
> messages from Pub/Sub and then write them to either files in cloud storage
> or to BigQuery. The ultimate goal will be to utilise the pipeline for real
> time streaming of event data to BigQuery (with various transformations) but
> also to store the raw messages long term in files in cloud storage.
>
> At the moment, I'm simply trying to parse the message to get the PubSub
> messageId and publishTime in order to be able to write them into the
> output. The json of my PubSub message looks like this:-
>
> [
>   {
> "ackId":
> "BCEhPjA-RVNEUAYWLF1GSFE3GQhoUQ5PXiM_NSAoRRIICBQFfH1xU1t1Xl8aB1ENGXJ8Zyc_XxcIB0BTeFVaEQx6bVxXOFcMEHF8YXZpWhUIA0FTfXeq5cveluzJNksxIbvE8KxfeqqmgfhiZho9XxJLLD5-PT5FQV5AEkw2C0RJUytDCypYEU4",
> "message": {
>   "attributes": {
> "source": "python"
>   },
>   "data": "eyJyb3dudW1iZXIiOiAyfQ==",
>   "messageId": "619310330691403",
>   "publishTime": "2019-07-12T08:27:58.522Z"
> }
>   }
> ]
> According to the documentation
> 
> the PubSub message payload returns the *data* and *attributes*
> properties; is there simply no way of retrieving the messageId and
> publishTime, or are these exposed somewhere else? If not, will the
> inclusion of these be in the roadmap, and are they available if using Java
> (I have zero Java experience hence why reaching for Python first).
>
> Kind regards,
>
> Matthew
>
>


[python] ReadFromPubSub broken in Flink

2019-07-12 Thread Chad Dombrova
Hi all,
This error came as a bit of a surprise.

Here’s a snippet of the traceback (full traceback below).

  File "apache_beam/runners/common.py", line 751, in
apache_beam.runners.common.DoFnRunner.process
return self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 423, in
apache_beam.runners.common.SimpleInvoker.invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"/Users/chad/dev/beam-tests/.venv/lib/python2.7/site-packages/apache_beam/io/iobase.py",
line 860, in split_source
AttributeError: '_PubSubSource' object has no attribute
'estimate_size' [while running 'PubSubInflow/Read/Split']


Flink is using _PubSubSource which is, as far as I can tell, a stub that
should be replaced at runtime by an actual streaming source. Is this error
a bug or a known limitation? If the latter, is there a Jira issue and any
momentum to solve this?

I’m pretty confused by this because the Apache Beam Portability Support
Matrix [1] makes it pretty clear that Flink supports streaming, and the
docs for “Built-in I/O Transforms” lists Google PubSub and BigQuery as the
only IO transforms that support streaming, so if streaming works with
Flink, PubSub should probably be the thing it works with.

I'm using beam 2.13.0 and flink 1.8.

thanks,
chad

[1]
https://docs.google.com/spreadsheets/d/1KDa_FGn1ShjomGd-UUDOhuh2q73de2tPz6BqHpzqvNI/edit#gid=0
[2] https://beam.apache.org/documentation/io/built-in/

Full traceback:

Caused by: java.util.concurrent.ExecutionException:
java.lang.RuntimeException: Error received from SDK harness for
instruction 5: Traceback (most recent call last):
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 157, in _execute
response = task()
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 190, in 
self._execute(lambda: worker.do_instruction(work), work)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 333, in do_instruction
request.instruction_id)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/sdk_worker.py",
line 359, in process_bundle
bundle_processor.process_bundle(instruction_id))
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
line 589, in process_bundle
].process_encoded(data.data)
  File 
"/usr/local/lib/python2.7/site-packages/apache_beam/runners/worker/bundle_processor.py",
line 143, in process_encoded
self.output(decoded_value)
  File "apache_beam/runners/worker/operations.py", line 246, in
apache_beam.runners.worker.operations.Operation.output
def output(self, windowed_value, output_index=0):
  File "apache_beam/runners/worker/operations.py", line 247, in
apache_beam.runners.worker.operations.Operation.output
cython.cast(Receiver, self.receivers[output_index]).receive(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 143, in
apache_beam.runners.worker.operations.SingletonConsumerSet.receive
self.consumer.process(windowed_value)
  File "apache_beam/runners/worker/operations.py", line 583, in
apache_beam.runners.worker.operations.DoOperation.process
with self.scoped_process_state:
  File "apache_beam/runners/worker/operations.py", line 584, in
apache_beam.runners.worker.operations.DoOperation.process
delayed_application = self.dofn_receiver.receive(o)
  File "apache_beam/runners/common.py", line 747, in
apache_beam.runners.common.DoFnRunner.receive
self.process(windowed_value)
  File "apache_beam/runners/common.py", line 753, in
apache_beam.runners.common.DoFnRunner.process
self._reraise_augmented(exn)
  File "apache_beam/runners/common.py", line 807, in
apache_beam.runners.common.DoFnRunner._reraise_augmented
raise_with_traceback(new_exn)
  File "apache_beam/runners/common.py", line 751, in
apache_beam.runners.common.DoFnRunner.process
return self.do_fn_invoker.invoke_process(windowed_value)
  File "apache_beam/runners/common.py", line 423, in
apache_beam.runners.common.SimpleInvoker.invoke_process
windowed_value, self.process_method(windowed_value.value))
  File 
"/Users/chad/dev/beam-tests/.venv/lib/python2.7/site-packages/apache_beam/io/iobase.py",
line 860, in split_source
AttributeError: '_PubSubSource' object has no attribute
'estimate_size' [while running 'PubSubInflow/Read/Split']


Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Awesome. I got it working for a single file, but for a structure of:

/part-0001/index
/part-0001/data
/part-0002/index
/part-0002/data

I tried to do /part-*  and /part-*/data

It does not find the multipart files. However if I just do /part-0001/data
it will find it and read it.

Any ideas why?

I am using this to generate the source:

static SequenceFileSource createSource(
ValueProvider sourcePattern) {
return new SequenceFileSource(
sourcePattern,
Text.class,
WritableSerialization.class,
Text.class,
WritableSerialization.class,
SequenceFile.SYNC_INTERVAL);
}

On Wed, Jul 10, 2019 at 10:52 AM Igor Bernstein 
wrote:

> It should be fairly straight forward:
> 1. Copy SequenceFileSource.java
> 
>  to
> your project
> 2. Add the source to your pipeline, configuring it with appropriate
> serializers. See here
> 
> for an example for hbase Results
>
> On Wed, Jul 10, 2019 at 10:58 AM Shannon Duncan <
> joseph.dun...@liveramp.com> wrote:
>
>> If I wanted to go ahead and include this within a new Java Pipeline, what
>> would I be looking at for level of work to integrate?
>>
>> On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía  wrote:
>>
>>> That's great. I can help whenever you need. We just need to choose its
>>> destination. Both the `hadoop-format` and `hadoop-file-system` modules
>>> are good candidates, I would even feel inclined to put it in its own
>>> module `sdks/java/extensions/sequencefile` to make it more easy to
>>> discover by the final users.
>>>
>>> A thing to consider is the SeekableByteChannel adapters, we can move
>>> that into hadoop-common if needed and refactor the modules to share
>>> code. Worth to take a look at
>>>
>>> org.apache.beam.sdk.io.hdfs.HadoopFileSystem.HadoopSeekableByteChannel#HadoopSeekableByteChannel
>>> to see if some of it could be useful.
>>>
>>> On Tue, Jul 2, 2019 at 11:46 PM Igor Bernstein 
>>> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > I wrote those classes with the intention of upstreaming them to Beam.
>>> I can try to make some time this quarter to clean them up. I would need a
>>> bit of guidance from a beam expert in how to make them coexist with
>>> HadoopFormatIO though.
>>> >
>>> >
>>> > On Tue, Jul 2, 2019 at 10:55 AM Solomon Duskis 
>>> wrote:
>>> >>
>>> >> +Igor Bernstein who wrote the Cloud Bigtable Sequence File classes.
>>> >>
>>> >> Solomon Duskis | Google Cloud clients | sdus...@google.com |
>>> 914-462-0531
>>> >>
>>> >>
>>> >> On Tue, Jul 2, 2019 at 4:57 AM Ismaël Mejía 
>>> wrote:
>>> >>>
>>> >>> (Adding dev@ and Solomon Duskis to the discussion)
>>> >>>
>>> >>> I was not aware of these thanks for sharing David. Definitely it
>>> would
>>> >>> be a great addition if we could have those donated as an extension in
>>> >>> the Beam side. We can even evolve them in the future to be more
>>> FileIO
>>> >>> like. Any chance this can happen? Maybe Solomon and his team?
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Tue, Jul 2, 2019 at 9:39 AM David Morávek 
>>> wrote:
>>> >>> >
>>> >>> > Hi, you can use SequenceFileSink and Source, from a BigTable
>>> client. Those works nice with FileIO.
>>> >>> >
>>> >>> >
>>> https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSink.java
>>> >>> >
>>> https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
>>> >>> >
>>> >>> > It would be really cool to move these into Beam, but that's up to
>>> Googlers to decide, whether they want to donate this.
>>> >>> >
>>> >>> > D.
>>> >>> >
>>> >>> > On Tue, Jul 2, 2019 at 2:07 AM Shannon Duncan <
>>> joseph.dun...@liveramp.com> wrote:
>>> >>> >>
>>> >>> >> It's not outside the realm of possibilities. For now I've created
>>> an intermediary step of a hadoop job that converts from sequence to text
>>> file.
>>> >>> >>
>>> >>> >> Looking into better options.
>>> >>> >>
>>> >>> >> On Mon, Jul 1, 2019, 5:50 PM Chamikara Jayalath <
>>> chamik...@google.com> wrote:
>>> >>> >>>
>>> >>> >>> Java SDK has a HadoopInputFormatIO using which you should be
>>> able to read Sequence files:
>>> https://github.com/apache/beam/blob/master/sdks/java/io/hadoop-format/src/main/java/org/apache/beam/sdk/io/hadoop/format/HadoopFormatIO.java
>>> >>> >>> I don't think there's a direct alternative for this for Python.
>>> >>> >>>
>>> >>> >>> Is it possible to write to a well-known format such as Avro
>>> instead of a Hadoop specific format which will allow you to read from both
>>>

Re: [Python] Read Hadoop Sequence File?

2019-07-12 Thread Shannon Duncan
Clarification on previous message. Only happens on local file system where
it is unable to match a pattern string. Via a `gs://` link it is
able to do multiple file matching.

On Fri, Jul 12, 2019 at 1:36 PM Shannon Duncan 
wrote:

> Awesome. I got it working for a single file, but for a structure of:
>
> /part-0001/index
> /part-0001/data
> /part-0002/index
> /part-0002/data
>
> I tried to do /part-*  and /part-*/data
>
> It does not find the multipart files. However if I just do /part-0001/data
> it will find it and read it.
>
> Any ideas why?
>
> I am using this to generate the source:
>
> static SequenceFileSource createSource(
> ValueProvider sourcePattern) {
> return new SequenceFileSource(
> sourcePattern,
> Text.class,
> WritableSerialization.class,
> Text.class,
> WritableSerialization.class,
> SequenceFile.SYNC_INTERVAL);
> }
>
> On Wed, Jul 10, 2019 at 10:52 AM Igor Bernstein 
> wrote:
>
>> It should be fairly straight forward:
>> 1. Copy SequenceFileSource.java
>> 
>>  to
>> your project
>> 2. Add the source to your pipeline, configuring it with appropriate
>> serializers. See here
>> 
>> for an example for hbase Results
>>
>> On Wed, Jul 10, 2019 at 10:58 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> If I wanted to go ahead and include this within a new Java Pipeline,
>>> what would I be looking at for level of work to integrate?
>>>
>>> On Wed, Jul 3, 2019 at 3:54 AM Ismaël Mejía  wrote:
>>>
 That's great. I can help whenever you need. We just need to choose its
 destination. Both the `hadoop-format` and `hadoop-file-system` modules
 are good candidates, I would even feel inclined to put it in its own
 module `sdks/java/extensions/sequencefile` to make it more easy to
 discover by the final users.

 A thing to consider is the SeekableByteChannel adapters, we can move
 that into hadoop-common if needed and refactor the modules to share
 code. Worth to take a look at

 org.apache.beam.sdk.io.hdfs.HadoopFileSystem.HadoopSeekableByteChannel#HadoopSeekableByteChannel
 to see if some of it could be useful.

 On Tue, Jul 2, 2019 at 11:46 PM Igor Bernstein <
 igorbernst...@google.com> wrote:
 >
 > Hi all,
 >
 > I wrote those classes with the intention of upstreaming them to Beam.
 I can try to make some time this quarter to clean them up. I would need a
 bit of guidance from a beam expert in how to make them coexist with
 HadoopFormatIO though.
 >
 >
 > On Tue, Jul 2, 2019 at 10:55 AM Solomon Duskis 
 wrote:
 >>
 >> +Igor Bernstein who wrote the Cloud Bigtable Sequence File classes.
 >>
 >> Solomon Duskis | Google Cloud clients | sdus...@google.com |
 914-462-0531
 >>
 >>
 >> On Tue, Jul 2, 2019 at 4:57 AM Ismaël Mejía 
 wrote:
 >>>
 >>> (Adding dev@ and Solomon Duskis to the discussion)
 >>>
 >>> I was not aware of these thanks for sharing David. Definitely it
 would
 >>> be a great addition if we could have those donated as an extension
 in
 >>> the Beam side. We can even evolve them in the future to be more
 FileIO
 >>> like. Any chance this can happen? Maybe Solomon and his team?
 >>>
 >>>
 >>>
 >>> On Tue, Jul 2, 2019 at 9:39 AM David Morávek 
 wrote:
 >>> >
 >>> > Hi, you can use SequenceFileSink and Source, from a BigTable
 client. Those works nice with FileIO.
 >>> >
 >>> >
 https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSink.java
 >>> >
 https://github.com/googleapis/cloud-bigtable-client/blob/master/bigtable-dataflow-parent/bigtable-beam-import/src/main/java/com/google/cloud/bigtable/beam/sequencefiles/SequenceFileSource.java
 >>> >
 >>> > It would be really cool to move these into Beam, but that's up to
 Googlers to decide, whether they want to donate this.
 >>> >
 >>> > D.
 >>> >
 >>> > On Tue, Jul 2, 2019 at 2:07 AM Shannon Duncan <
 joseph.dun...@liveramp.com> wrote:
 >>> >>
 >>> >> It's not outside the realm of possibilities. For now I've
 created an intermediary step of a hadoop job that converts from sequence to
 text file.
 >>> >>
 >>> >> Looking into better options.
 >>> >>
 >>> >> On Mon, Jul 1, 2019, 5:50 PM Chamikara Jayalath <
 chamik...@google.com> wrote:
 >>> >>>
 >>> >>> Java SDK has a HadoopInputFormatIO using which you should be
 able 

Re: [Java] TextIO not reading file as expected

2019-07-12 Thread Kenneth Knowles
Glad to hear it :-)

On Fri, Jul 12, 2019 at 6:33 AM Shannon Duncan 
wrote:

> So as it turns out, it was an STDOUT issue for my logging and not a data
> read in. Beam operated just fine but the way I was debugging was causing
> the glitches.
>
> Beam is operating as expected now.
>
> On Thu, Jul 11, 2019 at 10:28 PM Kenneth Knowles  wrote:
>
>> Doesn't sound good. TextIO has been around a long time so I'm surprised.
>> Would you mind creating a ticket in Jira (
>> https://issues.apache.org/jira/projects/BEAM/) and posting some
>> technical details, like input/output/code snippets?
>>
>> Kenn
>>
>> On Thu, Jul 11, 2019 at 9:45 AM Shannon Duncan <
>> joseph.dun...@liveramp.com> wrote:
>>
>>> I have a file where every line is a record separated by a tab. So a tab
>>> delimited file.
>>>
>>> However as I read this file in using TextIO.read().from(filename) and
>>> pass the results to a pardo, the elements are random chunks of the records.
>>> I expected the element to be the entire line of text which then I'll do
>>> parsing on from there.
>>>
>>> This file is processed in a python pipeline with ReadFromText perfectly
>>> fine. Just curious what would cause this on the Java side?
>>>
>>> Thanks,
>>> Shannon
>>>
>>