Scio 0.5.3 released

2018-05-01 Thread Josh Baer
Hi all,

We just released Scio 0.5.3 with a few enhancements and bug fixes.

Cheers,
Josh

https://github.com/spotify/scio/releases/tag/v0.5.3

*"Lasiorhinus latifrons"*
Features

   - Add enabled-parameter to SCollection#debug #1107
   
   - Support batching in BigtableIO #1057
    #1112
   
   - Update TensorFlow to 1.8.0 #1003
   
   - Upgrade sparkey  to 2.3.0 #1105
   
   - Add support for setting max memory usage for sparkey objects #1106
   

Bug fixes

   - Fix step names in the saveAsTypedBigQuery transform #1061
    #1127
   


Re: Chasing "Cannot output with timestamp" errors

2018-05-01 Thread Carlos Alonso
Yes, it is Scio code. That example was only 1ms, true, but there are other
examples where the difference is bigger, around 10 ms or even a bit more...

Thanks!

There are, though, other cases where the
On Tue, 1 May 2018 at 19:38, Kenneth Knowles  wrote:

> Hmm. Since they are different by 1ms I wonder if it is rounding /
> truncation combined with very slight skew between Pubsub & DF. Just a
> random guess. Your code does seem reasonable at first glance, from a Beam
> perspective (it is Scio code, yes?)
>
> Kenn
>
> On Tue, May 1, 2018 at 8:00 AM Carlos Alonso  wrote:
>
>> Ok, so after checking logs deeper I've found a line that seems to
>> identify the steps (Adding config for S2:
>> {"instructions":[{"name":"pubsubSubscriptionWithAttributes@{PubSubDataReader.scala:10}/PubsubUnboundedSource","originalName":"s13","outputs":..),
>> so that would mean that the exception is thrown from the reading from
>> PubSub step in which I actually run this code:
>>
>> sc.pubsubSubscriptionWithAttributes[String](s"projects/$projectId/subscriptions/$subscription")
>>   .withName("Set timestamps")
>>   .timestampBy(_ => new Instant)
>>   .withName("Apply windowing")
>>   .withFixedWindows(windowSize)
>>
>>
>> I'm setting the elements in the window when they're read because I'm
>> pairing them with the schemas read from BigQuery using a side transform
>> later on... Is it possible that the elements already have a (somehow
>> future) timestamp and this timestampBy transform is causing the issue?
>>
>> If that would be the case, the elements read from PubSub would need to
>> have a "later" timestamp than "now", as, by the exception message, my
>> transform, that is setting timestamps to "now" would actually be trying to
>> set them backwards... Does it make any sense? (I'll try to dive into the
>> read from PubSub transform to double check...)
>>
>> Thanks!
>>
>> On Tue, May 1, 2018 at 4:44 PM Carlos Alonso 
>> wrote:
>>
>>> Hi everyone!!
>>>
>>> I have a job that reads heterogeneous messages from PubSub and,
>>> depending on its type, writes them to the appropriate BigQuery table and I
>>> keep getting random "java.lang.IllegalArgumentException: Cannot output with
>>> timestamp" errors that I cannot identify, and I can't even figure out which
>>> part of the code is actually throwing the Exception by looking at the
>>> stacktrace...
>>>
>>> You can find the full stacktrace here: https://pastebin.com/1gN4ED2A and
>>> a couple of job ids are this 2018-04-26_08_56_42-10071326408494590980 and
>>> this: 2018-04-27_09_19_13-15798240702327849959
>>>
>>> Trying to, at least, figure out the source transform of the error, the
>>> logs says the trace was at stage S2, but I don't know how to identify which
>>> parts of my pipeline form which stages...
>>>
>>> Thanks!!
>>>
>>


Re: Chasing "Cannot output with timestamp" errors

2018-05-01 Thread Kenneth Knowles
Hmm. Since they are different by 1ms I wonder if it is rounding /
truncation combined with very slight skew between Pubsub & DF. Just a
random guess. Your code does seem reasonable at first glance, from a Beam
perspective (it is Scio code, yes?)

Kenn

On Tue, May 1, 2018 at 8:00 AM Carlos Alonso  wrote:

> Ok, so after checking logs deeper I've found a line that seems to identify
> the steps (Adding config for S2:
> {"instructions":[{"name":"pubsubSubscriptionWithAttributes@{PubSubDataReader.scala:10}/PubsubUnboundedSource","originalName":"s13","outputs":..),
> so that would mean that the exception is thrown from the reading from
> PubSub step in which I actually run this code:
>
> sc.pubsubSubscriptionWithAttributes[String](s"projects/$projectId/subscriptions/$subscription")
>   .withName("Set timestamps")
>   .timestampBy(_ => new Instant)
>   .withName("Apply windowing")
>   .withFixedWindows(windowSize)
>
>
> I'm setting the elements in the window when they're read because I'm
> pairing them with the schemas read from BigQuery using a side transform
> later on... Is it possible that the elements already have a (somehow
> future) timestamp and this timestampBy transform is causing the issue?
>
> If that would be the case, the elements read from PubSub would need to
> have a "later" timestamp than "now", as, by the exception message, my
> transform, that is setting timestamps to "now" would actually be trying to
> set them backwards... Does it make any sense? (I'll try to dive into the
> read from PubSub transform to double check...)
>
> Thanks!
>
> On Tue, May 1, 2018 at 4:44 PM Carlos Alonso  wrote:
>
>> Hi everyone!!
>>
>> I have a job that reads heterogeneous messages from PubSub and, depending
>> on its type, writes them to the appropriate BigQuery table and I keep
>> getting random "java.lang.IllegalArgumentException: Cannot output with
>> timestamp" errors that I cannot identify, and I can't even figure out which
>> part of the code is actually throwing the Exception by looking at the
>> stacktrace...
>>
>> You can find the full stacktrace here: https://pastebin.com/1gN4ED2A and
>> a couple of job ids are this 2018-04-26_08_56_42-10071326408494590980 and
>> this: 2018-04-27_09_19_13-15798240702327849959
>>
>> Trying to, at least, figure out the source transform of the error, the
>> logs says the trace was at stage S2, but I don't know how to identify which
>> parts of my pipeline form which stages...
>>
>> Thanks!!
>>
>


Re: Chasing "Cannot output with timestamp" errors

2018-05-01 Thread Carlos Alonso
Ok, so after checking logs deeper I've found a line that seems to identify
the steps (Adding config for S2:
{"instructions":[{"name":"pubsubSubscriptionWithAttributes@{PubSubDataReader.scala:10}/PubsubUnboundedSource","originalName":"s13","outputs":..),
so that would mean that the exception is thrown from the reading from
PubSub step in which I actually run this code:

sc.pubsubSubscriptionWithAttributes[String](s"projects/$projectId/subscriptions/$subscription")
  .withName("Set timestamps")
  .timestampBy(_ => new Instant)
  .withName("Apply windowing")
  .withFixedWindows(windowSize)


I'm setting the elements in the window when they're read because I'm
pairing them with the schemas read from BigQuery using a side transform
later on... Is it possible that the elements already have a (somehow
future) timestamp and this timestampBy transform is causing the issue?

If that would be the case, the elements read from PubSub would need to have
a "later" timestamp than "now", as, by the exception message, my transform,
that is setting timestamps to "now" would actually be trying to set them
backwards... Does it make any sense? (I'll try to dive into the read from
PubSub transform to double check...)

Thanks!

On Tue, May 1, 2018 at 4:44 PM Carlos Alonso  wrote:

> Hi everyone!!
>
> I have a job that reads heterogeneous messages from PubSub and, depending
> on its type, writes them to the appropriate BigQuery table and I keep
> getting random "java.lang.IllegalArgumentException: Cannot output with
> timestamp" errors that I cannot identify, and I can't even figure out which
> part of the code is actually throwing the Exception by looking at the
> stacktrace...
>
> You can find the full stacktrace here: https://pastebin.com/1gN4ED2A and
> a couple of job ids are this 2018-04-26_08_56_42-10071326408494590980 and
> this: 2018-04-27_09_19_13-15798240702327849959
>
> Trying to, at least, figure out the source transform of the error, the
> logs says the trace was at stage S2, but I don't know how to identify which
> parts of my pipeline form which stages...
>
> Thanks!!
>