Scio 0.10.0 released

2021-03-03 Thread Neville Li
Hi all,

We just released Scio 0.10.0. Here's a short summary of the notable changes
since 0.9.x:

- Better decoupled Google Cloud Platform dependencies
- Simplify coder implicits for faster compilation
- Sort Merge Bucket performance improvements and bug fixes
- Type-safe Parquet support
- Parquet SMB support
- TensorFlow 2.x support
- ZetaSketch compatibly HyperLogLog++
- Redis IO and DoFn

https://github.com/spotify/scio/releases/tag/v0.10.0

Cheers,
Neville


Scio 0.9.5 released

2020-10-02 Thread Neville Li
Hi all,

We just released Scio 0.9.5. This release upgrades Beam to the latest
2.24.0 and includes several improvements and bug fixes, including Parquet
Avro dynamic destinations, Scalable Bloom Filter and many others. This will
also likely be the last 0.9.x release before we start working on the next
0.10 branch.

Join #scio in our channel (invite here ) for
questions and discussions.

Cheers
Neville

https://github.com/spotify/scio/releases/tag/v0.9.5

*"Colovaria"*

There are no breaking changes in this release, but some were introduced
with v0.9.0:

See v0.9.0 Migration Guide
 for
detailed instructions.
Improvements

   - Add custom GenericJson pretty print (#3367
   )
   - scio-parquet to support dynamic destinations for windowed scollections
   (#3356 )
   - Support $LATEST replacement for Query (#3357
   )
   - Mutable ScalableBloomFilter (#3339
   )
   - Add specialized TupleCoders (#3350
   )
   - Add nullCoder on Record and Disjunction coders (#3349
   )

Bug Fixes

   - Support null-key records in smb writes (#3359
   )
   - Fix serialization struggles in SMB transform API (#3342
   )
   - Grammar / spelling fixes in migration guides (#3358
   )
   - Remove unused macro import (#3353
   )
   - Remove unused BaseSeqLikeCoder implicit (#3344
   )
   - Filter out potentially included env directories (#3322
   )
   - Simplify LowPriorityCoders (#3320
   )
   - Remove unused and not useful Coder implicit trait (#3319
   )
   - Make javaBeanCoder lower prio (#3318
   )

Dependency Updates

   - Update Beam to 2.24.0 (#3325
   )
   - Update scalafmt-core to 2.7.3 (#3364
   )
   - Update elasticsearch-rest-client, ... to 7.9.2 (#3347
   )
   - Update hadoop libs to 2.8.5 (#3337
   )
   - Update sbt-scalafix to 0.9.21 (#3335
   )
   - Update sbt-mdoc to 2.2.9 (#3327
   )
   - Update sbt-avro to 3.1.0 (#3323
   )
   - Update mysql-socket-factory to 1.1.0 (#3321
   )
   - Update scala-collection-compat to 2.2.0 (#3312
   )
   - Update sbt-mdoc to 2.2.8 (#3313
   )


Re: Scio community hangout

2020-09-24 Thread Neville Li
So this is happening next Thu Oct 1 12PM US EDT/9AM PDT/16:00 UTC. Drop in
if you want to chat or see what we're up to. See you there!

meet.google.com/vze-suyd-kwd

On Mon, Sep 14, 2020 at 1:51 PM Neville Li  wrote:

> Cross posting here in case you're interested or already using Scio
> <https://github.com/spotify/scio>, the Scala API for Apache Beam.
>
> -- Forwarded message -
> From: Neville Li 
> Date: Mon, Sep 14, 2020 at 1:49 PM
> Subject: Scio community hangout
> To: Scio Users 
>
>
> Hi all,
>
> With the current trend of conferences/meetups going virtual, it's hard to
> have casual hallway chat about data, Beam, Scio, etc. We're thinking about
> doing a Scio community hangout, to share progress, roadmap, priorities, but
> also to catch up with the community, hear your feedback, use cases and
> knowledge share.
>
> We're thinking the last week of Sept, noon US Eastern Time so it's
> feasible for west coast & EU folks. Please fill in the Doodle with your
> date preference. Hangout link to follow. We'll make this recurring if
> there's enough interest.
> https://doodle.com/poll/2hcazuzcs4kvw9tu
>
> Cheers,
> Neville
>


Fwd: Scio community hangout

2020-09-14 Thread Neville Li
Cross posting here in case you're interested or already using Scio
<https://github.com/spotify/scio>, the Scala API for Apache Beam.

-- Forwarded message -
From: Neville Li 
Date: Mon, Sep 14, 2020 at 1:49 PM
Subject: Scio community hangout
To: Scio Users 


Hi all,

With the current trend of conferences/meetups going virtual, it's hard to
have casual hallway chat about data, Beam, Scio, etc. We're thinking about
doing a Scio community hangout, to share progress, roadmap, priorities, but
also to catch up with the community, hear your feedback, use cases and
knowledge share.

We're thinking the last week of Sept, noon US Eastern Time so it's feasible
for west coast & EU folks. Please fill in the Doodle with your date
preference. Hangout link to follow. We'll make this recurring if there's
enough interest.
https://doodle.com/poll/2hcazuzcs4kvw9tu

Cheers,
Neville


Scio 0.9.4 released

2020-09-10 Thread Neville Li
Hi all,

We just released Scio 0.9.4.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.9.4

*"Deletrius"*

There are no breaking changes in this release, but some were introduced
with v0.9.0:

See v0.9.0 Migration Guide
 for
detailed instructions.
Improvements

   - Add SCollection#filterNot (#3291
   )
   - Improve filterValues doc (#3290
   )
   - Add support for JDBC sharding by UUID encoded as string (#3307
   )
   - Add optimised coder derivation for AnyVal (#3296
   )
   - Support BigQuery Avro Format (#3221
   )
   - Support sparkey compression, fix #3210
    (#3295
   )
   - Warn if sparkey is bigger than memory, #3280
   
   - (fix #3278 ) Warn on
   chained .groupByKey.join (#3297
   )
   - [SMB] delete file early in NativeFileSorter (#3274
   )
   - Change default SMB codec to Deflate to match Scio (#3247
   )
   - Add java.time LocalDate, LocalDateTime, LocalTime, Period, Duration
   coders(#3238 )

Bug Fixes

   - Remove duplicate ShardedSparkeyReader
   - Use andThen for future side effect ops (#3275
   )
   - Fix Bigquery IT test
   - Add state to exception message when pipeline is cancelled (#3270
   )
   - Avoid scala.jdk.CollectionConverters implicit import in Avro macro (
   #3250 )
   - Avoid scala.jdk.CollectionConverters implicit import in Bigquery macro
   (#3240 )
   - fix(avro-bq): added EnumSymbol case for matching avro types to BQ
   TableRow (#3226 ) (#3232
   )
   - Remove uneeded LowPriority implicit (#3239
   )
   - Remove coders deprecation warns (#3242
   )
   - Use ## and support consistent Array hashCode (#3246
   )
   - Improve SMB error handling (#3253
   )
   - Workaround for sorter memory limit #3260
    (#3269
   )

Dependency Updates

   - Bump sparkey to 3.2.0
   - Remove unused imports (#3243
   )
   - Update case-app, case-app-annotations, ... to 2.0.4 (#3256
   )
   - Update cassandra-all to 3.11.8 (#3281
   )
   - Update cassandra-driver-core to 3.10.2 (#3276
   )
   - Update commons-io to 2.8.0 (#3310
   )
   - Update elasticsearch, ... to 7.9.1 (#3301
   )
   - Update elasticsearch, ... to 6.8.12 (#3264
   )
   - Update flink runner to 1.10.1 (#3249
   )
   - Update magnolia to 0.17.0 (#3262
   )
   - Update magnolify-avro, magnolify-bigtable, ... to 0.2.3 (#3263
   )
   - Update parquet-avro, parquet-column, ... to 1.11.1 (#3251
   )
   - Update protobuf-generic to 0.2.9 (#3227
   )
   - Update protobuf-java to 3.13.0 (#3257
   )
   - Update sbt-avro to 3.0.0 (#3252
   )
   - Update sbt-bloop to 1.4.4 (#3287
   )
   - Update sbt-buildinfo to 0.10.0 (#3245
   )
   - Update sbt-java-formatter to 0.6.0 (#3259
   )
   - Update sbt-jmh to 0.4.0 (#3288
   )
   - Update sbt-mdoc to 2.2.7 (#3311
   )
   - Update sbt-mima-plugin to 0.8.0 (#3305
   )
   - Update sbt-scalafix to 0.9.20 (#3298
   )
   - Update scalactic to 3.2.2 (#3271
   )
   - Update scalafmt-core to 2.7.0
   - Update scala

Scio 0.9.3 released

2020-08-05 Thread Neville Li
Hi all,

We just released Scio 0.9.3. This bumps Beam SDK to 2.23.0 and includes a
lot of improvements & bug fixes.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.9.3

*"Petrificus Totalus"*

There are no breaking changes in this release, but some were introduced
with v0.9.0:

See v0.9.0 Migration Guide
 for
detailed instructions.
Improvements

   - Allow user-supplied filename prefix for smb writes/reads (#3215
   )
   - Refactor SortedBucketTransform into a BoundedSource + reuse merge
   logic (#3097 )
   - Add keyGroupFilter optimization to scio-smb (#3160
   )
   - Add error message to BaseAsyncLookupDoFn preconditions check (#3176
   )
   - Add Elasticsearch 5,6,7 add/update alias on multiple indices ops (#3134
   )
   - Add initial update alias op to ES7(#2920
   )
   - Add ScioContext#applyTransform (#3146
   )
   - Allow SCollection#transform name override (#3142
   )
   - Allow setting default name through SCollection#applyTransform (#3144
   )
   - Update 0.9 migration doc and add Bigquery Type read schema
   documentation(#3148 )

Bug Fixes

   - AvroBucketMetadata should validate keyPath (fix #3038
   ) (#3140
   )
   - Allow union types in non leaf field for key (#3187
   )
   - Fix issue with union type as non-leaf field of smb key (#3193
   )
   - Fix ContextAndArgs#typed overloading issue (#3199
   )
   - Fix error propagation on Scala Future onSuccess callback (#3178
   )
   - Fix ByteBuffer should be readOnly (#3220
   )
   - Fix compiler warnings (#3183
   )
   - Fix JdbcShardedReadOptions.fetchSize description (#3209
   )
   - Fix FAQ typo (#3194 )
   - Fix scalafix error in SortMergeBucketScioContextSyntax (#3158
   )
   - Add scalafix ExplicitReturnType and ProcedureSyntax rules (#3179
   )
   - Cleanup a few more unused and unchecked params (#3223
   )
   - Use GcpOptions#getWorkerZone instead of deprecated GcpOptions#getZone (
   #3224 )
   - Use raw coder in SCollection#applyKvTransform (#3171
   )
   - Add raw beam coder materializer (#3164
   )
   - Avoid circular dep between SCollection and PCollectionWrapper (#3163
   )
   - Remove unused param of internal partitionFn (#3166
   )
   - Remove unused CoderRegistry (#3165
   )
   - Remove defunct scio-bench (#3150
   )
   - Reuse applyTransform (#3162 
   )
   - Make multijoin.py python3
   - Use TextIO#withCompression (#3145
   )

Dependency Updates

   - Update Beam SDK to 2.23.0 (#3197
   )
   - Update dependencies to be inline with 2.23.0 (#3225
   )
   - Update to scala 2.12.12 (#3157
   )
   - Update auto-value to 1.7.4 (#3147
   )
   - Update breeze to 1.1 (#3211 
   )
   - Update cassandra-all to 3.11.7 (#3186
   )
   - Update cassandra-driver-core to 3.10.0 (#3195
   )
   - Update commons-lang3 to 3.11 (#3161
   )
   - Update commons-text to 1.9 (#3185
   )
   - Update contributing guidelines with current tools (#3149
   )
   - Update elasticsearch-rest-client, ... to 7.8.1 (#3192
   )
   - Update elasticsearch, ... to 6.8.11 (#3188
   )
   - Update jackson-module-scala to 2.1

Scio 0.9.0 released

2020-04-20 Thread Neville Li
Hi all,

We just released Scio 0.9.0. The biggest change was dropping Scala 2.11 and
adding 2.13 support. Also included are Guava/magnolia powered Bloom Filter,
improved test messages and other improvements.

For those using Featran  for feature
engineering, the matching release is 0.6.0.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.9.0

*"Furnunculus"*
Breaking changes

   - See v0.9.0 Migration Guide
    for
   detailed instructions
   - Remove deprecated elasticsearch2 (#2800
   )
   - Remove deprecated cassandra2 (#2801
   )
   - Remove deprecated tensorflow saveAsTfExampleFile (#2798
   )
   - Remove toEither from ScioUtil (#2799
   )
   - Remove ReflectiveRecordIO (#2856
   )
   - Remove context close in favor of run (#2858
   )
   - Remove deprecated ScioContext Future references (#2859
   )
   - Rework implicits/syntax for scio-extra bigquery package (#2844
   )
   - Remove implicit Coder requirement for .saveAsSortedBucket (#2839
   )
   - Drop scala 2.11 support (#2619
   )
   - Re-vamp Bloom filter and sparse-transforms (#2651
   )
   - Remove deprecated bigQuery, typedBigQuery and saveAsBigQuery (#2806
   )

Improvements

   - Add scala 2.13 support (#2619
   )
   - Add queryAsSource to BigQueryType (#2804
   )
   - Deprecate BigQueryType query in favor of queryRaw (#2857
   )
   - Make OptionCoder extends from AtomicCoder (#2882
   )
   - Make iterable and traversable coders buffered (#2881
   )
   - Better support for alternative runners in tests (#2877
   )
   - Use UUID in SMB temp directory (#2849
   )
   - Reuse ApproxFilter (#2817 )
   - Support metadata in AvroFileOperations (fix #2832
   ) (#2834
   )
   - Add --help command line support for custom PipelineOptions (#2840
   )(#2843
   )
   - Add covary method to lift SCollection to the specified type (#2808
   )
   - Customize equality in unit tests and better failure message (#2733
   )
   - Add more convinience methods that support default transform names (
   #2805 )

Bug Fixes

   - Fix create scio-spanner it clients lazy (#2889
   )
   - Fix generate tree eagerly before checking for private constructors (
   #2846 )
   - Fix missing-bucket case when Sink collection is empty (#2869
   )
   - Remove uneeded caffeine dep in scio-bigquery (#2861
   )
   - Fix Sharded Sparkey string hashing behaviour for strings longer than
   one character. (#2826 )
   - Add magnolify BigtableType usage examples to scio-examples #2789
    (#2816
   )
   - Check jobReference.location for query location (#2845
   )
   - Fix NPE in BaseAsyncLookupDoFn.Try#hashCode() (#2841
   )
   - Fix: cancel job on waitUntilFinish timeout (#2823
   )
   - Fix: full camelCase typed args support (#2777
   )

Dependency Updates

   - Update magnolify to 0.1.7
   - Update magnolia to 0.14.5 (#2886
   )
   - Update beam-runners-core-construction-java, ... to 2.20.0 (#2887
   )
   - Update scala-collection-compat to 2.1.5 (#2885
   )
   - Update gcs-connector to hadoop2-2.1.2 (#2842
   )
   - Update algebra to 2.0.1 (#2821
   )
   

Re: Scio 0.8.3 released

2020-03-30 Thread Neville Li
A quick follow up, we just released Scio 0.8.4. This is mostly a minor bug
fix release and the last one for 0.8.x.
We'll hopefully get an initial 0.9.0 release out in a few weeks.

https://github.com/spotify/scio/releases/tag/v0.8.4

*"Expecto Patronum"*
Features

   - Initial flink runner support #2785
   <https://github.com/spotify/scio/pull/2785>
   - Add support for default transform names #2760
   <https://github.com/spotify/scio/issues/2760> #2803
   <https://github.com/spotify/scio/pull/2803>
   - Add fromStorage doc #2792 <https://github.com/spotify/scio/issues/2792>

Bug fixes & improvements

   - Fix offset in SlidingWindows #2790
   <https://github.com/spotify/scio/pull/2790>
   - Reduce Consumer impl leak in SortedBucketTransform (#2795
   <https://github.com/spotify/scio/pull/2795>)
   - Revert "make CoGbkResult Iterable slightly lazy (#2749
   <https://github.com/spotify/scio/pull/2749>)" #2791
   <https://github.com/spotify/scio/pull/2791>


On Fri, Mar 20, 2020 at 2:10 PM Neville Li  wrote:

> Hi all,
>
> We just released Scio 0.8.3. This is mainly a bug fix release with some
> minor improvements.
>
> *Notes on JDK 11*
> JDK 11 is now supported starting with 0.8.2. To upgrade, you'll need to
> build your code and submit jobs to Dataflow with JDK 11.
>
> *Notes on Scala 2.11*
> Scio 0.8.x will be the last releases to support Scala 2.11. 0.9.x will
> support 2.12 & 2.13 only. Upgrading to 2.12 should be mostly transparent
> barring some stale external dependencies. Please reach out if you run into
> any issues.
>
> Cheers,
> Neville
>
> https://github.com/spotify/scio/releases/tag/v0.8.3
>
> *"Draconifors"*
> Features
>
>- Add CsvIO (#2701 <https://github.com/spotify/scio/pull/2701>)
>- Implement hashSubtractByKey (#2769
><https://github.com/spotify/scio/pull/2769>)
>- Add directory treatment and compression option to readFiles (#2756
><https://github.com/spotify/scio/pull/2756>)
>- Add FileNaming and prefix support to BinaryIO (#2759
><https://github.com/spotify/scio/pull/2759>)
>- Add Header and Footer support to TextIO (#2758
><https://github.com/spotify/scio/pull/2758>)
>- Add GenericJson coder (#2742
><https://github.com/spotify/scio/pull/2742>)
>- Add ProtobufUtil for Message<->Avro conversions (#2743
><https://github.com/spotify/scio/pull/2743>)
>
> Bug fixes & improvements
>
>- Add schema cache support in BigQueryType (#2775
><https://github.com/spotify/scio/pull/2775>)
>- Optimize KeyGroupIterator (#2779
><https://github.com/spotify/scio/pull/2779>)
>- Make CoGbkResult Iterable slightly lazy (#2749
><https://github.com/spotify/scio/pull/2749>)
>- Make java Enum coder deterministic #2761
><https://github.com/spotify/scio/issues/2761> #2764
><https://github.com/spotify/scio/pull/2764>
>- Fix saveAsTypedBigQuery deprecation message (#2763
><https://github.com/spotify/scio/pull/2763>)
>- Patch SortValues doFn to avoid re-enconding bytes (#2748
><https://github.com/spotify/scio/pull/2748>)
>- Fix: remove redundant withExtendedErrorInfo in dynamic BigQuery (
>#2744 <https://github.com/spotify/scio/pull/2744>)
>
>


Scio 0.8.3 released

2020-03-20 Thread Neville Li
Hi all,

We just released Scio 0.8.3. This is mainly a bug fix release with some
minor improvements.

*Notes on JDK 11*
JDK 11 is now supported starting with 0.8.2. To upgrade, you'll need to
build your code and submit jobs to Dataflow with JDK 11.

*Notes on Scala 2.11*
Scio 0.8.x will be the last releases to support Scala 2.11. 0.9.x will
support 2.12 & 2.13 only. Upgrading to 2.12 should be mostly transparent
barring some stale external dependencies. Please reach out if you run into
any issues.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.8.3

*"Draconifors"*
Features

   - Add CsvIO (#2701 )
   - Implement hashSubtractByKey (#2769
   )
   - Add directory treatment and compression option to readFiles (#2756
   )
   - Add FileNaming and prefix support to BinaryIO (#2759
   )
   - Add Header and Footer support to TextIO (#2758
   )
   - Add GenericJson coder (#2742
   )
   - Add ProtobufUtil for Message<->Avro conversions (#2743
   )

Bug fixes & improvements

   - Add schema cache support in BigQueryType (#2775
   )
   - Optimize KeyGroupIterator (#2779
   )
   - Make CoGbkResult Iterable slightly lazy (#2749
   )
   - Make java Enum coder deterministic #2761
    #2764
   
   - Fix saveAsTypedBigQuery deprecation message (#2763
   )
   - Patch SortValues doFn to avoid re-enconding bytes (#2748
   )
   - Fix: remove redundant withExtendedErrorInfo in dynamic BigQuery (#2744
   )


Scio 0.8. 2 released

2020-03-03 Thread Neville Li
Hi all,

We just released Scio 0.8.2. This is mainly a bug fix release with some
small improvements. Thanks to all the contributors!

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.8.2

*"Capacious Extremis"*
Features

   - Update Beam to 2.19.0 (#2665
   )
   - Add SCollection#tap (#2713 )
   - Add SortedBucketTransform (#2693
   )
   - Add API bindings for SortedBucketTransform (#2712
   )
   - Add counters to SMB read transforms (#2704
   )
   - Add ParquetExampleIO (#2696 
   )
   - Allow setting MIME type on RemoteFileUtil.upload (#2711
   )
   - Add BigQuery macro Avro Schema and toAvro support (#2646
   )

Bug fixes & improvements

   - Ignore updates on google libs that we need to keep in sync with beam (
   #2728 )
   - Add all scio compile deps (#2634
   )
   - Bump circe to 0.13.0 (#2717 
   )
   - Fix BigQuery Storage default params (#2692
   )
   - Fix: TableRow#getRecord returns Map (#2710
   )
   - Fix: eval coder eagerly for sealead traits (#2699
   )
   - Add readFiles alternative to deprecated readAll (#2697
   )
   - Minor update topByKey comments (#2695
   )
   - Remove duplicate stage files (#2678
   )
   - Make files to stage list mutable (#2676
   )
   - Make Coder[Set] throw NonDeterministicException (#2674
   )
   - Prevent sharded-sparkey shards from having negative indices. (#2667
   )
   - Reduce SMB Dataflow graph size for many-bucketed joins (#2666
   )
   - Add test for #2649 , Avro
   map field in SMB sink (#2664 )
   - Force GenericRecord writer in SMB Avro, fix 2649 (#2650
   )
   - Fix: Add missing toAvro to BigQueryType (#2657
   )
   - Remove patched bigquery rows iterator (#2647
   )


Scio 0.8.0 released

2020-01-08 Thread Neville Li
Hi all,

We just released Scio 0.8.0. This is based on the most recent Beam 2.17.0
release and includes a lot of new features & bug fixes over the past 10
months.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.8.0

*"Amato Animo Animato Animagus"*
Breaking changes & deprecations

   - See v0.8.0 Migration Guide
    for
   detailed instructions
   - Remove @experimental from transform #2537
   
   - Deprecate scio-elasticsearch2 and scio-cassandra2 #2414
    #2421
   
   - Deprecate hashFilter #2442 
   - Deprecate legacy components in scio-extras #2533
   

Features

   - Bump Beam to 2.17.0 #2577 
   - Add sharded Sparkey support. #2336
   
   - Rework side input cache #2363
   
   - Cleanup Side Inputs API, introduce Singleton Set SideInputs #2424
   
   - Add schema support for GenericRecord #2514
   

Bug fixes & improvements

   - Add file:linenum only to outer transform #2405
   
   - Fix join transform names #2444
   
   - Remove Coder context bound for partitionByKey #2451
   
   - Rename that method argument in join functions to rhs #2466
   
   - Replace custom ClosureCleaner with chill's #2423Use chill's
   Externalizer to serialize predicates in SCollectionMatchers #2410
   
   - Add errmsg when beamOpts == null in JobTest, fix #2430
    #2545
   
   - Add bigQuerySelect() method with default flattenResults value #2500
   
   - Better consistency around BigQuery API #2412Fail early on malformed
   BigQuery spec #2345 
   - Rewrite typedBigQueryStorage #2434
   
   - Add DML query support to bigquery client #2418
   
   - Treat Avro array as java List in BigQuery read, fix #2068
    #2415
   
   - Fix NPE in scio-bigtable's ChannelPoolCreator when credentials aren't
   set #2317Fix bigtable scollection ops return type #2486
   
   - Refactor PubsubIO for more typesafety #2457
   
   - Avoid Mutation coder fallback for Spanner #2478
   
   - Fix Parquet sink suffix #2367Improve iterable equality #2483
   
   - Improve back compat with Scio 0.7 #2401
   
   - Improve coder gen by checking companion implicits #2522
   
   - Make recursive coders serializable #2404
   
   - Remove kryo coder override in intermediate steps #2422Fix fallback
   warning when implicit is in scope #2511
   
   - Improve the schema compatibility error message #2366
   
   - Remove schema fallback #2489
   
   - Add Schemas support for more types #2364
   
   - Assert FileStorage.isDone in MaterializeTap #2518
   
   - Add support for cleaning up TF models on shutdown #2549
   
   - Rework TensorFlow predict ops #2343
   
   - Remove unused/deprecated TensorFlow graph DoFn #2339
   
   - Mark some APIs in scio-extras as experimental #2517
    #2572
   


Scio 0.8.0-beta2 released

2019-10-16 Thread Neville Li
Hi all,

We've just released Scio 0.8.0-beta2 (Scala API for Apache Beam for those
on the Beam mailing list). This will likely be the last beta before 0.8.0
stable release.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.8.0-beta2

Features

   - Bump Beam to 2.16.0 (#2292 )
   - Bump featran to 0.4.0
   - Add DelegatingSideInput and TypedSparkeySideInput. (#2287
   )
   - Add initial BigQuery methods with Beam schema (#2197
   )
   - Allow user to pass in Bigtable GcRule (#2256
   )
   - Support accessing all counters within JobTest (fix #2007
   ) (#2247
   )
   - Add BigQuery Source type (#2214
   )
   - Add Storage API read from query (#2204
   )

Bug Fixes & Improvements

   - Segregate and deprecate avro reflective records support (#2300
   )
   - Fix lambda ser/de in SCollectionMatchers, #2294
    (#2296
   )
   - Use PubsubIO.readMessagesWithCoderAndParseFn from Beam 2.16 (#2295
   )
   - Wrap PubsubWithAttributes write in composite PTransform (fix #2286
   ) (#2288
   )
   - AvroType macro shouldn't generate duplicate record classes fix #2264
   ) (#2267
   )
   - Add Paradise dependency back to scio-repl classpath (fix #2265
   ) (#2266
   )
   - Use JavaSerializer for encoding Throwable (fix #2249
   ) (#2261
   )
   - Check name uniqueness without running on Dataflow (#2252
   )
   - Added fix for BinaryIO output. (#2251
   )
   - Truncate app args if long (#2244
   )
   - Fix: run repl with jdk11 (#2246
   )
   - Reduce guava usage #2223 
   #2242 
   - Use upstream magnolia for scala >= 2.12, fix #2192
    (#2241
   )
   - Fix #2096 : restore sbt <
   1.3.0 behaviour by not closing classloader (#2224
   )
   - Update version check message (#2227
   )
   - Don't wrap exceptions in WrappedBCoder (#2266
   )
   - Bypass nullable coder for ObjectFileIO, fix #2187
    (#2188
   )

Breaking Changes

   - Decouple SQL extension from scio-core (#2201
   )


SDK support status clarification

2019-07-11 Thread Neville Li
Hi all, more specifically Googlers here,

I want to clarify the Beam SDK support status w.r.t. Dataflow runner here:
https://cloud.google.com/dataflow/docs/support/sdk-version-support-status

When a Beam SDK is deprecated, what does it mean for users running it on
Dataflow?

The page mentions that SDK 2.0.0 to 2.4.0 "will be decommissioned in late
2019". Meaning that pipelines using these SDKs will stop running? Is there
a specific date?

Also similar question, are there planned decommission dates for SDK
versions > 2.4.0?

Cheers,
Neville


Re: AvroIO read SpecificRecord with custom reader schema?

2019-06-13 Thread Neville Li
That gives me a GenericRecord which is not type safe.
In my case I have the compiled SpecificRecord class i.e. MyRecord
available, but would like to pass in a schema other than
MyRecord.getClassSchema() to say populate a subset of the fields.

On Thu, Jun 13, 2019 at 6:18 PM Chamikara Jayalath 
wrote:

> Does AvroIO.readGenericRecords() work ?
>
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/io/AvroIO.java#L333
>
> Thanks,
> Cham
>
> On Thu, Jun 13, 2019 at 1:46 PM Neville Li  wrote:
>
>> Hi,
>>
>> Is it just me or is there no way for AvroIO to read SpecificRecords with
>> a custom reader schema?
>>
>> AvroIO.read(Class recordClass) will use the schema of T and there's
>> no way to override it.
>>
>> Cheers,
>> Neville
>>
>


AvroIO read SpecificRecord with custom reader schema?

2019-06-13 Thread Neville Li
Hi,

Is it just me or is there no way for AvroIO to read SpecificRecords with a
custom reader schema?

AvroIO.read(Class recordClass) will use the schema of T and there's no
way to override it.

Cheers,
Neville


Scio 0.7.4 released

2019-03-25 Thread Neville Li
Hi all,

We just released Scio 0.7.4. The biggest change was upgrading to Apache
Beam 2.11.0.

https://github.com/spotify/scio/releases/tag/v0.7.4

*"Watsonula wautieri"*
Features

   - Add sequence example support to scio-tensorflow (#1757
   )
   - Add compile time warning when Coder fallsback to Kryo on GenericRecord
   (#1768 )
   - Bump Beam to 2.11 (#1739 )
   - Update probuf version to 3.7.0 (#1752
   )
   - Bump GCS connector version (#1766
   )
   - Bump casssandra-driver-core to 3.7.1 (#1760
   )

Bug Fixes & Improvements

   - Update coursier (#1751 )
   - Fix project ID in Spanner admin client (#1771
   )
   - Fix spanner IT test
   - Refactor Tensorflow SCollection syntax (#1763
   )
   - Rename Avro ScioContext ops class (#1765
   )
   - Add explict project to fix failing ScioIOBenchmark (#1764
   )
   - Rename unify TF save function names (#1762
   )
   - Add FAQ for IntelliJ SBT heap
   - Remove outdated maintainers notes in FAQ.md
   - Minor doc site fix (#1759 )
   - Refactor avro syntax (#1753 
   )
   - Avro IO with type bound (#1737
   )
   - Update release docs


Scio 0.6.1 released

2018-09-12 Thread Neville Li
Hi all,

We just released Scio 0.6.1. This is mainly a bug fix release.
Also just a heads up, we'll be releasing alpha/beta of 0.7.0 soon which
will include some major breaking changes. Keep an eye on this section

of the changelog for updates.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.6.1

*"Rhyncholestes raphanurus"*
Features

   - Expose ScioResult#isTest #1336
   
   - Meaningful NPE message for nulls in BigQueryType macro #1303
    #1332
   
   - Add streaming benchmark #1294
    #1333
    #1338
    #1353
   
   - Bump ASM version #1358 

Bug fixes

   - Make PCollection names unique #1356
   
   - Register joda-time serializers #1341
    #1347
   
   - Fix duplicate jars in classpath #1334
    #1348
   
   - Use location-aware Dataflow job endpoints #1337
   
   - Change DoFnWithResource logging level to DEBUG #1351
   
   - Cache schema in AvroType macro #1025
    #1359
   


Scio 0.5.6 released

2018-07-25 Thread Neville Li
Hi all,

We just released Scio 0.5.6. This release includes a lot of improvements,
bug fixes, and is based on the latest Beam 2.5.0.

https://github.com/spotify/scio/releases/tag/v0.5.6

*"Orcinus orca"*
Breaking changes

   - Beam 2.5.0 introduced a breaking change in the metrics API. See e664dfc
   

for
   details

Features

   - Bump Beam to 2.5.0 #1222 
   - Bump featran to 0.2.0 #1255 
   - Bump dependency versions #1213
   
   - Add topic function to PubSubAdmin #1216
   
   - Add BigQueryClient#createTable and BigQueryClient#createTypedTable
   #1195 
   - Add support for Distinct with representative values to SCollections
   #1241 
   - Add timer to scalatest tests #1201
   
   - Tweak saveAsTfExample #1250 

Bug fixes

   - Optimize kryo serialization #1248
   
   - Remove unacessary memoize on kryo state input/output #1245
   
   - Set kryo options before loading registrars #1209
   
   - Clean up KryoAtomicCoder #1220
    #1230
   
   - Normalize feature names in TFRecordSpec #1214
    #1215
   
   - Fix tensorflow/zoltar dependencies #1212
   
   - Remove explicit google-api-services-storage version dependency #1235
   
   - Do not cache SideInput for Global windows #1190
    #1231
   
   - Use SingletonSideInput in cross #1211
   
   - Make default for args.getOrElse call-by-name #1225
   
   - Add username to TypeProvider path #1205
    #1244
   
   - Enforce protobuf-java 3.3.1 for scio-tensorflow #1240
   


Scio 0.5.4 released

2018-05-14 Thread Neville Li
Hi all,

We just released Scio 0.5.4. This release includes 2 important
serialization fixes and a few new features. We recommend upgrading if
you're on 0.5.x.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.5.4

*"Marmota monax"*
Features

   - Add satisfySingleValue SCollection matcher #1150
   
   - Add a partition according to predicate combinator to SCollection #1153
   
   - Limit Parallellism in DoFns #1151
   
   - Add MiMa plugin #1132 
   - Add support for beam parameters in JobTest #1145
   
   - Add basic linting scalac options #1097
    #1147
    #1148
   

Bug fixes

   - Add MiMa plugin #1132 
   - Fix serialization of large ByteString #1136
    #1140
   
   - Replace Kryo ThreadLocal with object pool #1143
   
   - Fix string representation in BigQuery type provider #1154
   


Scio 0.5.2 released

2018-04-05 Thread Neville Li
Hi all,

We just released Scio 0.5.2 with a few enhancements and bug fixes.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.5.2

*"Kobus kob"*
Features

   - Add Java Converters #1013 
#1076 
   - Add unionAll on ScioContext to support empty lists #1092
    #1095
   
   - Support custom data validation in type-safe BigQuery #1075
   

Bug fixes

   - Fix scio-repl on Windows #1093
    #1093
   
   - Fix missing schema on AvroIO TypedWrite #1088
    #1089
   


Re: Scio 0.5.0 released

2018-03-09 Thread Neville Li
On Fri, Mar 9, 2018 at 4:31 PM 'Eugene Kirpichov' via Scio Users <
scio-us...@googlegroups.com> wrote:

> Hi!
>
> On Fri, Mar 9, 2018 at 1:22 PM Rafal Wojdyla  wrote:
>
>> Hi all,
>>
>> We have just released Scio 0.5.0. This is a major/breaking release - make
>> sure to read the breaking changes section below.
>>
>> - rav
>>
>> https://github.com/spotify/scio/releases/tag/v0.5.0
>>
>> *"In ictu"*
>>
>> Breaking changes
>>
>>- BigQueryIO in JobTest#output now requires a type parameter. Explicit
>> .map(T.toTableRow)of test data is no longer needed.
>>- Typed AvroIO now accepts case classes instead of Avro records in
>>JobTest. Explicit .map(T.toGenericRecord) of test data is no longer
>>needed. See this change
>>
>> 
>> for more.
>>- Package com.spotify.scio.extra.transforms is moved from scio-extra
>>to scio-core, under com.spotify.scio.transforms.
>>
>> See this section
>> 
>>  for more details.
>> Features
>>
>>- Support reading BigQuery as Avro #964
>>, #992
>>
>>- BigQuery client now supports load and export #1060
>>
>>- Add TFRecordSpec support for Featran #1002
>>
>>- Add AsyncLookupDoFn #1012
>>
>>
>> This looks like a very useful general-purpose tool, Beam users have asked
> for something like this many times; and the implementation looks very
> high-quality as well. Any interest in backporting it into Beam?
>

Sounds like a good idea. We'll look into it. Might take a while though
since our tests are in scala.
https://github.com/spotify/scio/issues/1071

>
>
>>
>>- Bump sparkey to 2.2.1, protobuf-generic to 0.2.4 #1028
>>
>>- Added ser/der support for joda DateTime #1038
>>
>>- Password is now optional for jdbc connection #1040
>>
>>- Add job cancellation option to ScioResult#waitUntilDone #1056
>> #1058
>> #1062
>> #1066
>>
>>
>> Bug fixes
>>
>>- Fix transform name in joins #1034
>> #1035
>>
>>- Add applyKvTransform #1020
>> #1032
>>
>>- Add helpers to initialize counters #1026
>> #1027
>>
>>- Fix SCollectionMatchers serialization #1001
>>
>>- Check runner version #1008
>> #1009
>>
>>- Log exception in AsyncLookupDoFn only if cache put fails #1039
>>
>>- ProjectId nonEmpty string check in BigQueryClient #1045
>>
>>- Fix SCollection#withSlidingWindows #1054
>>
>>
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Scio Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to scio-users+unsubscr...@googlegroups.com.
> To post to this group, send email to scio-us...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/scio-users/CAFmTo4-7m9X9zH7pE1G0Mi-X3FuPCGFurkCh23ghdSaH48c-sA%40mail.gmail.com
> 
> .
> For more options, visit https://groups.google.com/d/optout.
>


Fwd: Scio 0.5.0-alpha2 released

2018-01-29 Thread Neville Li
+user@beam.apache.org 


-- Forwarded message -
From: Neville Li 
Date: Mon, Jan 29, 2018 at 6:54 PM
Subject: Scio 0.5.0-alpha2 released
To: d...@beam.apache.org 


Hi all,

We just released Scio 0.5.0-alpha2. This is mostly a bug fix release. We'll
probably have one or 2 beta releases with the upcoming Beam 2.3.0. Stay
tuned!

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.5.0-alpha2

Breaking changes

   - BigQueryIO in JobTest#output now requires a type parameter. Explicit
   .map(T.toTableRow) of test data is no longer needed.
   - Typed AvroIO now accepts case classes instead of Avro records in
   JobTest. Explicit .map(T.toGenericRecord) of test data is no longer
   needed. See this change
   
<https://github.com/spotify/scio/commit/19fee4716f71827ac4affbd23d753bc074c529b8>
for
   more.
   - Package com.spotify.scio.extra.transforms is moved from scio-extra to
   scio-core, under com.spotify.scio.transforms.

See this section
<https://github.com/spotify/scio/wiki/Apache-Beam#breaking-changes-since-scio-050>
for
more details.
Features

   - Remove toGenericRecord requirement when testing typed AvrioIO #1022
   <https://github.com/spotify/scio/issues/1022> #1036
   <https://github.com/spotify/scio/pull/1036>
   - Bump sparkey to 2.2.1, protobuf-generic to 0.2.4 #1028
   <https://github.com/spotify/scio/pull/1028>

Bug fixes

   - Fix transform name in joins #1034
   <https://github.com/spotify/scio/issues/1034> #1035
   <https://github.com/spotify/scio/pull/1035>
   - Add applyKvTransform #1020
   <https://github.com/spotify/scio/issues/1020> #1032
   <https://github.com/spotify/scio/pull/1032>
   - Add helpers to initialize counters #1026
   <https://github.com/spotify/scio/issues/1026> #1027
   <https://github.com/spotify/scio/pull/1027>


Re: Trying to understand Unable to encode element exceptions

2018-01-25 Thread Neville Li
Here's a fix to #1020
https://github.com/spotify/scio/pull/1032

On Sun, Jan 21, 2018 at 4:36 PM Neville Li  wrote:

> Awesome!
> We have't wrapped any stateful processing API in scala but if you have
> working snippet or ideas it'd be great to share in that ticket.
>
> On Sat, Jan 20, 2018 at 4:31 PM Carlos Alonso 
> wrote:
>
>> Thanks Neville!!
>>
>> Your recommendation worked great. Thanks for your help!!
>>
>> As a side note, I found this issue:
>> https://github.com/spotify/scio/issues/448
>>
>> I can share/help there with our experience, as our job, with scio +
>> stateful + timely processing is working fine as of today
>>
>> Regards!!
>>
>> On Fri, Jan 19, 2018 at 6:21 PM Neville Li  wrote:
>>
>>> Welcome.
>>>
>>> Added an issue so we may improve this in the future:
>>> https://github.com/spotify/scio/issues/1020
>>>
>>>
>>> On Fri, Jan 19, 2018 at 11:14 AM Carlos Alonso 
>>> wrote:
>>>
>>>> To build the beam transform I was following this example:
>>>> https://github.com/spotify/scio/blob/master/scio-examples/src/main/scala/com/spotify/scio/examples/extra/DoFnExample.scala
>>>>
>>>> To be honest I don't know how to apply timely and stateful processing
>>>> without using a beam transform or how to rewrite it using the scio built-in
>>>> you suggest. Could you please give me an example?
>>>>
>>>> Thanks for your help!
>>>>
>>>> On Fri, Jan 19, 2018 at 5:04 PM Neville Li 
>>>> wrote:
>>>>
>>>>> That happens when you mix beam transforms into scio and defeats the
>>>>> safety we have in place. Map the values into something beam-serializable
>>>>> first or rewrite the transform with a scio built-in which takes care of
>>>>> KvCoder.
>>>>>
>>>>> On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso 
>>>>> wrote:
>>>>>
>>>>>> I'm following this example:
>>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>>>>>>
>>>>>> because I'm building something very similar to a group into batches
>>>>>> functionality. If I don't set the coder manually, this exception arises:
>>>>>> https://pastebin.com/xxdDMXSf
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> On Fri, Jan 19, 2018 at 4:35 PM Neville Li 
>>>>>> wrote:
>>>>>>
>>>>>>> You shouldn't manually set coder in most cases. It defaults to
>>>>>>> KryoAtomicCoder for most Scala types.
>>>>>>> More details:
>>>>>>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>>>>>>
>>>>>>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> May it be because I’m using
>>>>>>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>>>>>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])))
>>>>>>>>  at
>>>>>>>> some point in the pipeline
>>>>>>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>>>>>>> outputs a SerializableCoder)?
>>>>>>>>
>>>>>>>> This is something I've always wondered. How does one specify a
>>>>>>>> coder for a case class?
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> On Fri, 19 Jan 2018 at 15:51, Neville Li 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Not sure why it falls back to SerializableCoder. Can you file an
>>>>>>>>> GH issue with ideally a snippet that can reproduce the problem?
>>>>>>>>>
>>>>>>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi everyone!!
>>>>>>>>>>
>>>>>>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>>>>>>> subscription into GCS buckets. In order to do it I'm using both 
>>>>>>>>>> stateful
>>>>>>>>>> and timely processing and after building and testing the project 
>>>>>>>>>> locally I
>>>>>>>>>> tried to run it on Google Dataflow and I started getting those 
>>>>>>>>>> errors.
>>>>>>>>>>
>>>>>>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>>>>>>
>>>>>>>>>> The item I'm trying to serialize is a KV[String,
>>>>>>>>>> MessageWithAttributes] and MessageWithAttributes is a case class 
>>>>>>>>>> defined as
>>>>>>>>>> (content: String, attrs: Map[String, String])
>>>>>>>>>>
>>>>>>>>>> The underlying clause is java.io.NotSerializableException:
>>>>>>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's 
>>>>>>>>>> Scio as
>>>>>>>>>> well) which may suggest that the issue is on serializing the Map, 
>>>>>>>>>> but to be
>>>>>>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>>>>>>
>>>>>>>>>> Can anyone help me, please?
>>>>>>>>>> Thanks!
>>>>>>>>>>
>>>>>>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-21 Thread Neville Li
Awesome!
We have't wrapped any stateful processing API in scala but if you have
working snippet or ideas it'd be great to share in that ticket.

On Sat, Jan 20, 2018 at 4:31 PM Carlos Alonso  wrote:

> Thanks Neville!!
>
> Your recommendation worked great. Thanks for your help!!
>
> As a side note, I found this issue:
> https://github.com/spotify/scio/issues/448
>
> I can share/help there with our experience, as our job, with scio +
> stateful + timely processing is working fine as of today
>
> Regards!!
>
> On Fri, Jan 19, 2018 at 6:21 PM Neville Li  wrote:
>
>> Welcome.
>>
>> Added an issue so we may improve this in the future:
>> https://github.com/spotify/scio/issues/1020
>>
>>
>> On Fri, Jan 19, 2018 at 11:14 AM Carlos Alonso 
>> wrote:
>>
>>> To build the beam transform I was following this example:
>>> https://github.com/spotify/scio/blob/master/scio-examples/src/main/scala/com/spotify/scio/examples/extra/DoFnExample.scala
>>>
>>> To be honest I don't know how to apply timely and stateful processing
>>> without using a beam transform or how to rewrite it using the scio built-in
>>> you suggest. Could you please give me an example?
>>>
>>> Thanks for your help!
>>>
>>> On Fri, Jan 19, 2018 at 5:04 PM Neville Li 
>>> wrote:
>>>
>>>> That happens when you mix beam transforms into scio and defeats the
>>>> safety we have in place. Map the values into something beam-serializable
>>>> first or rewrite the transform with a scio built-in which takes care of
>>>> KvCoder.
>>>>
>>>> On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso 
>>>> wrote:
>>>>
>>>>> I'm following this example:
>>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>>>>>
>>>>> because I'm building something very similar to a group into batches
>>>>> functionality. If I don't set the coder manually, this exception arises:
>>>>> https://pastebin.com/xxdDMXSf
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Fri, Jan 19, 2018 at 4:35 PM Neville Li 
>>>>> wrote:
>>>>>
>>>>>> You shouldn't manually set coder in most cases. It defaults to
>>>>>> KryoAtomicCoder for most Scala types.
>>>>>> More details:
>>>>>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>>>>>
>>>>>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>>>>>> wrote:
>>>>>>
>>>>>>> May it be because I’m using
>>>>>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>>>>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])))
>>>>>>>  at
>>>>>>> some point in the pipeline
>>>>>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>>>>>> outputs a SerializableCoder)?
>>>>>>>
>>>>>>> This is something I've always wondered. How does one specify a coder
>>>>>>> for a case class?
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> On Fri, 19 Jan 2018 at 15:51, Neville Li 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Not sure why it falls back to SerializableCoder. Can you file an GH
>>>>>>>> issue with ideally a snippet that can reproduce the problem?
>>>>>>>>
>>>>>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi everyone!!
>>>>>>>>>
>>>>>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>>>>>> subscription into GCS buckets. In order to do it I'm using both 
>>>>>>>>> stateful
>>>>>>>>> and timely processing and after building and testing the project 
>>>>>>>>> locally I
>>>>>>>>> tried to run it on Google Dataflow and I started getting those errors.
>>>>>>>>>
>>>>>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>>>>>
>>>>>>>>> The item I'm trying to serialize is a KV[String,
>>>>>>>>> MessageWithAttributes] and MessageWithAttributes is a case class 
>>>>>>>>> defined as
>>>>>>>>> (content: String, attrs: Map[String, String])
>>>>>>>>>
>>>>>>>>> The underlying clause is java.io.NotSerializableException:
>>>>>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's 
>>>>>>>>> Scio as
>>>>>>>>> well) which may suggest that the issue is on serializing the Map, but 
>>>>>>>>> to be
>>>>>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>>>>>
>>>>>>>>> Can anyone help me, please?
>>>>>>>>> Thanks!
>>>>>>>>>
>>>>>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
Welcome.

Added an issue so we may improve this in the future:
https://github.com/spotify/scio/issues/1020

On Fri, Jan 19, 2018 at 11:14 AM Carlos Alonso  wrote:

> To build the beam transform I was following this example:
> https://github.com/spotify/scio/blob/master/scio-examples/src/main/scala/com/spotify/scio/examples/extra/DoFnExample.scala
>
> To be honest I don't know how to apply timely and stateful processing
> without using a beam transform or how to rewrite it using the scio built-in
> you suggest. Could you please give me an example?
>
> Thanks for your help!
>
> On Fri, Jan 19, 2018 at 5:04 PM Neville Li  wrote:
>
>> That happens when you mix beam transforms into scio and defeats the
>> safety we have in place. Map the values into something beam-serializable
>> first or rewrite the transform with a scio built-in which takes care of
>> KvCoder.
>>
>> On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso 
>> wrote:
>>
>>> I'm following this example:
>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>>>
>>> because I'm building something very similar to a group into batches
>>> functionality. If I don't set the coder manually, this exception arises:
>>> https://pastebin.com/xxdDMXSf
>>>
>>> Thanks!
>>>
>>> On Fri, Jan 19, 2018 at 4:35 PM Neville Li 
>>> wrote:
>>>
>>>> You shouldn't manually set coder in most cases. It defaults to
>>>> KryoAtomicCoder for most Scala types.
>>>> More details:
>>>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>>>
>>>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>>>> wrote:
>>>>
>>>>> May it be because I’m using
>>>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes]))) 
>>>>> at
>>>>> some point in the pipeline
>>>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>>>> outputs a SerializableCoder)?
>>>>>
>>>>> This is something I've always wondered. How does one specify a coder
>>>>> for a case class?
>>>>>
>>>>> Regards
>>>>>
>>>>> On Fri, 19 Jan 2018 at 15:51, Neville Li 
>>>>> wrote:
>>>>>
>>>>>> Not sure why it falls back to SerializableCoder. Can you file an GH
>>>>>> issue with ideally a snippet that can reproduce the problem?
>>>>>>
>>>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi everyone!!
>>>>>>>
>>>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>>>> subscription into GCS buckets. In order to do it I'm using both stateful
>>>>>>> and timely processing and after building and testing the project 
>>>>>>> locally I
>>>>>>> tried to run it on Google Dataflow and I started getting those errors.
>>>>>>>
>>>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>>>
>>>>>>> The item I'm trying to serialize is a KV[String,
>>>>>>> MessageWithAttributes] and MessageWithAttributes is a case class 
>>>>>>> defined as
>>>>>>> (content: String, attrs: Map[String, String])
>>>>>>>
>>>>>>> The underlying clause is java.io.NotSerializableException:
>>>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's 
>>>>>>> Scio as
>>>>>>> well) which may suggest that the issue is on serializing the Map, but 
>>>>>>> to be
>>>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>>>
>>>>>>> Can anyone help me, please?
>>>>>>> Thanks!
>>>>>>>
>>>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
Didn't realize the map is in a case class which is serializable, but
`java.util.Map` is not. So this won't work transitively.
You best bet is to write a custom Coder (you can compose a map coder for
the map field) for the entire case class and set it as part of the KvCoder.

On Fri, Jan 19, 2018 at 11:22 AM Carlos Alonso  wrote:

> You mean replacing the Map[String, String] from the case class into a
> java.util.Map? And then, how could I set that
> MapCoder for that bit?
>
> Sorry if those questions are too newbie, but this is my first experience
> with Beam...
>
> Thanks!
>
> On Fri, Jan 19, 2018 at 5:19 PM Neville Li  wrote:
>
>> In this case it's probably easiest to map the scala `Map[K, V]` into a
>> `java.util.Map` and explicitly set a `MapCoder` so you don't
>> have to deal with internal coder inference.
>>
>>
>> On Fri, Jan 19, 2018 at 11:03 AM Neville Li 
>> wrote:
>>
>>> That happens when you mix beam transforms into scio and defeats the
>>> safety we have in place. Map the values into something beam-serializable
>>> first or rewrite the transform with a scio built-in which takes care of
>>> KvCoder.
>>>
>>> On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso 
>>> wrote:
>>>
>>>> I'm following this example:
>>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>>>>
>>>> because I'm building something very similar to a group into batches
>>>> functionality. If I don't set the coder manually, this exception arises:
>>>> https://pastebin.com/xxdDMXSf
>>>>
>>>> Thanks!
>>>>
>>>> On Fri, Jan 19, 2018 at 4:35 PM Neville Li 
>>>> wrote:
>>>>
>>>>> You shouldn't manually set coder in most cases. It defaults to
>>>>> KryoAtomicCoder for most Scala types.
>>>>> More details:
>>>>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>>>>
>>>>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>>>>> wrote:
>>>>>
>>>>>> May it be because I’m using
>>>>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>>>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes]))) 
>>>>>> at
>>>>>> some point in the pipeline
>>>>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>>>>> outputs a SerializableCoder)?
>>>>>>
>>>>>> This is something I've always wondered. How does one specify a coder
>>>>>> for a case class?
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> On Fri, 19 Jan 2018 at 15:51, Neville Li 
>>>>>> wrote:
>>>>>>
>>>>>>> Not sure why it falls back to SerializableCoder. Can you file an GH
>>>>>>> issue with ideally a snippet that can reproduce the problem?
>>>>>>>
>>>>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi everyone!!
>>>>>>>>
>>>>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>>>>> subscription into GCS buckets. In order to do it I'm using both 
>>>>>>>> stateful
>>>>>>>> and timely processing and after building and testing the project 
>>>>>>>> locally I
>>>>>>>> tried to run it on Google Dataflow and I started getting those errors.
>>>>>>>>
>>>>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>>>>
>>>>>>>> The item I'm trying to serialize is a KV[String,
>>>>>>>> MessageWithAttributes] and MessageWithAttributes is a case class 
>>>>>>>> defined as
>>>>>>>> (content: String, attrs: Map[String, String])
>>>>>>>>
>>>>>>>> The underlying clause is java.io.NotSerializableException:
>>>>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's 
>>>>>>>> Scio as
>>>>>>>> well) which may suggest that the issue is on serializing the Map, but 
>>>>>>>> to be
>>>>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>>>>
>>>>>>>> Can anyone help me, please?
>>>>>>>> Thanks!
>>>>>>>>
>>>>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
In this case it's probably easiest to map the scala `Map[K, V]` into a
`java.util.Map` and explicitly set a `MapCoder` so you don't
have to deal with internal coder inference.

On Fri, Jan 19, 2018 at 11:03 AM Neville Li  wrote:

> That happens when you mix beam transforms into scio and defeats the safety
> we have in place. Map the values into something beam-serializable first or
> rewrite the transform with a scio built-in which takes care of KvCoder.
>
> On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso  wrote:
>
>> I'm following this example:
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>>
>> because I'm building something very similar to a group into batches
>> functionality. If I don't set the coder manually, this exception arises:
>> https://pastebin.com/xxdDMXSf
>>
>> Thanks!
>>
>> On Fri, Jan 19, 2018 at 4:35 PM Neville Li  wrote:
>>
>>> You shouldn't manually set coder in most cases. It defaults to
>>> KryoAtomicCoder for most Scala types.
>>> More details:
>>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>>
>>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>>> wrote:
>>>
>>>> May it be because I’m using
>>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes]))) at
>>>> some point in the pipeline
>>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>>> outputs a SerializableCoder)?
>>>>
>>>> This is something I've always wondered. How does one specify a coder
>>>> for a case class?
>>>>
>>>> Regards
>>>>
>>>> On Fri, 19 Jan 2018 at 15:51, Neville Li  wrote:
>>>>
>>>>> Not sure why it falls back to SerializableCoder. Can you file an GH
>>>>> issue with ideally a snippet that can reproduce the problem?
>>>>>
>>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>>> wrote:
>>>>>
>>>>>> Hi everyone!!
>>>>>>
>>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>>> subscription into GCS buckets. In order to do it I'm using both stateful
>>>>>> and timely processing and after building and testing the project locally 
>>>>>> I
>>>>>> tried to run it on Google Dataflow and I started getting those errors.
>>>>>>
>>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>>
>>>>>> The item I'm trying to serialize is a KV[String,
>>>>>> MessageWithAttributes] and MessageWithAttributes is a case class defined 
>>>>>> as
>>>>>> (content: String, attrs: Map[String, String])
>>>>>>
>>>>>> The underlying clause is java.io.NotSerializableException:
>>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's Scio 
>>>>>> as
>>>>>> well) which may suggest that the issue is on serializing the Map, but to 
>>>>>> be
>>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>>
>>>>>> Can anyone help me, please?
>>>>>> Thanks!
>>>>>>
>>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
That happens when you mix beam transforms into scio and defeats the safety
we have in place. Map the values into something beam-serializable first or
rewrite the transform with a scio built-in which takes care of KvCoder.

On Fri, Jan 19, 2018, 10:56 AM Carlos Alonso  wrote:

> I'm following this example:
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/GroupIntoBatches.java#L60
>
> because I'm building something very similar to a group into batches
> functionality. If I don't set the coder manually, this exception arises:
> https://pastebin.com/xxdDMXSf
>
> Thanks!
>
> On Fri, Jan 19, 2018 at 4:35 PM Neville Li  wrote:
>
>> You shouldn't manually set coder in most cases. It defaults to
>> KryoAtomicCoder for most Scala types.
>> More details:
>> https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders
>>
>> On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso 
>> wrote:
>>
>>> May it be because I’m using
>>> .setCoder(KvCoder.of(StringUtf8Coder.of(),
>>> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes]))) at
>>> some point in the pipeline
>>> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
>>> outputs a SerializableCoder)?
>>>
>>> This is something I've always wondered. How does one specify a coder for
>>> a case class?
>>>
>>> Regards
>>>
>>> On Fri, 19 Jan 2018 at 15:51, Neville Li  wrote:
>>>
>>>> Not sure why it falls back to SerializableCoder. Can you file an GH
>>>> issue with ideally a snippet that can reproduce the problem?
>>>>
>>>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso 
>>>> wrote:
>>>>
>>>>> Hi everyone!!
>>>>>
>>>>> I'm building a pipeline to store items from a Google PubSub
>>>>> subscription into GCS buckets. In order to do it I'm using both stateful
>>>>> and timely processing and after building and testing the project locally I
>>>>> tried to run it on Google Dataflow and I started getting those errors.
>>>>>
>>>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>>>
>>>>> The item I'm trying to serialize is a KV[String,
>>>>> MessageWithAttributes] and MessageWithAttributes is a case class defined 
>>>>> as
>>>>> (content: String, attrs: Map[String, String])
>>>>>
>>>>> The underlying clause is java.io.NotSerializableException:
>>>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's Scio 
>>>>> as
>>>>> well) which may suggest that the issue is on serializing the Map, but to 
>>>>> be
>>>>> honest, I don't know what does it mean and how to fix it.
>>>>>
>>>>> Can anyone help me, please?
>>>>> Thanks!
>>>>>
>>>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
You shouldn't manually set coder in most cases. It defaults to
KryoAtomicCoder for most Scala types.
More details:
https://github.com/spotify/scio/wiki/Scio%2C-Beam-and-Dataflow#coders

On Fri, Jan 19, 2018, 10:27 AM Carlos Alonso  wrote:

> May it be because I’m using
> .setCoder(KvCoder.of(StringUtf8Coder.of(),
> CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes]))) at
> some point in the pipeline
> (CoderRegistry.createDefault().getCoder(classOf[MessageWithAttributes])
> outputs a SerializableCoder)?
>
> This is something I've always wondered. How does one specify a coder for a
> case class?
>
> Regards
>
> On Fri, 19 Jan 2018 at 15:51, Neville Li  wrote:
>
>> Not sure why it falls back to SerializableCoder. Can you file an GH issue
>> with ideally a snippet that can reproduce the problem?
>>
>> On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso  wrote:
>>
>>> Hi everyone!!
>>>
>>> I'm building a pipeline to store items from a Google PubSub subscription
>>> into GCS buckets. In order to do it I'm using both stateful and timely
>>> processing and after building and testing the project locally I tried to
>>> run it on Google Dataflow and I started getting those errors.
>>>
>>> The full stack trace is here: https://pastebin.com/LqecPhsq
>>>
>>> The item I'm trying to serialize is a KV[String, MessageWithAttributes]
>>> and MessageWithAttributes is a case class defined as (content: String,
>>> attrs: Map[String, String])
>>>
>>> The underlying clause is java.io.NotSerializableException:
>>> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's Scio as
>>> well) which may suggest that the issue is on serializing the Map, but to be
>>> honest, I don't know what does it mean and how to fix it.
>>>
>>> Can anyone help me, please?
>>> Thanks!
>>>
>>


Re: Trying to understand Unable to encode element exceptions

2018-01-19 Thread Neville Li
Not sure why it falls back to SerializableCoder. Can you file an GH issue
with ideally a snippet that can reproduce the problem?

On Fri, Jan 19, 2018, 7:43 AM Carlos Alonso  wrote:

> Hi everyone!!
>
> I'm building a pipeline to store items from a Google PubSub subscription
> into GCS buckets. In order to do it I'm using both stateful and timely
> processing and after building and testing the project locally I tried to
> run it on Google Dataflow and I started getting those errors.
>
> The full stack trace is here: https://pastebin.com/LqecPhsq
>
> The item I'm trying to serialize is a KV[String, MessageWithAttributes]
> and MessageWithAttributes is a case class defined as (content: String,
> attrs: Map[String, String])
>
> The underlying clause is java.io.NotSerializableException:
> com.spotify.scio.util.JMapWrapper$$anon$2 (yes, I'm using Spotify's Scio as
> well) which may suggest that the issue is on serializing the Map, but to be
> honest, I don't know what does it mean and how to fix it.
>
> Can anyone help me, please?
> Thanks!
>


Scio 0.5.0-alpha1 is out

2018-01-17 Thread Neville Li
Hi all,

We just released Scio 0.5.0-alpha1. This release includes a typed BigQuery
performance improvement by bypassing intermediate TableRow JSONs. It has
shown a 2x speed up in some of our benchmarks.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.5.0-alpha1

*"Ia Io"*
Breaking changes

   - BigQueryIO in JobTest#output now requires a type parameter. Explicit
   .map(T.toTableRow) of test data is no longer needed.
   - Package com.spotify.scio.extra.transforms is moved from scio-extra to
   scio-core, under com.spotify.scio.transforms.

See this section

for
more details.
Features

   - Support reading BigQuery as Avro #964
   , #992
   
   - Add TFRecordSpec support for Featran #1002
   
   - Add AsyncLookupDoFn #1012 

Bug fixes

   - Fix SCollectionMatchers serialization #1001
   
   - Check runner version #1008
    #1009
   


Re: Scio 0.4.7 released

2018-01-04 Thread Neville Li
Not sure what you mean. We use Beam file IOs whenever possible which should
already use the FileSystems API.

On Thu, Jan 4, 2018 at 3:48 PM Jean-Baptiste Onofré  wrote:

> Hi Neville,
>
> Great  Thanks for the update.
>
> By the way, any plan to leverage the Beam filesystems in Scio now ?
>
> Regards
> JB
>
> On 01/04/2018 09:41 PM, Neville Li wrote:
> > Hi all,
> >
> > We just release Scio 0.4.7. This release fixed a join performance
> regression and
> > introduced several improvements.
> >
> > https://github.com/spotify/scio/releases/tag/v0.4.7
> >
> > /"Hydrochoerus hydrochaeris"/
> >
> >
> >   Features
> >
> >   * Add support for TFRecordSpec #990 <
> https://github.com/spotify/scio/pull/990>
> >   * Add convenience methods to randomSpit #987
> > <https://github.com/spotify/scio/pull/987>
> >   * Add BigtableDoFn #922 <https://github.com/spotify/scio/issues/922>
> #931
> > <https://github.com/spotify/scio/pull/931>
> >   * Add optional arguments validation #979
> > <https://github.com/spotify/scio/pull/979>
> >   * Performance improvement in Avro and BigQuery macro converters #989
> > <https://github.com/spotify/scio/issues/989>
> >   * Update new bigquery datetime and timestamp specification #982
> > <https://github.com/spotify/scio/pull/982>
> >
> >
> >   Bug fixes
> >
> >   * Fix join performance regression introduced in 0.4.4 #976
> > <https://github.com/spotify/scio/issues/976> #983
> > <https://github.com/spotify/scio/pull/983>
> >   * Fine tune dynamic sink API and and integration tests #993
> > <https://github.com/spotify/scio/issues/993> #995
> > <https://github.com/spotify/scio/pull/995>
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>


Scio 0.4.7 released

2018-01-04 Thread Neville Li
Hi all,

We just release Scio 0.4.7. This release fixed a join performance
regression and introduced several improvements.

https://github.com/spotify/scio/releases/tag/v0.4.7

*"Hydrochoerus hydrochaeris"*
Features

   - Add support for TFRecordSpec #990
   
   - Add convenience methods to randomSpit #987
   
   - Add BigtableDoFn #922  #931
   
   - Add optional arguments validation #979
   
   - Performance improvement in Avro and BigQuery macro converters #989
   
   - Update new bigquery datetime and timestamp specification #982
   

Bug fixes

   - Fix join performance regression introduced in 0.4.4 #976
    #983
   
   - Fine tune dynamic sink API and and integration tests #993
    #995
   


Fwd: Scio 0.4.6 released

2017-12-19 Thread Neville Li
Hi all,

First cross-post here but figured it makes sense. Scio is a Scala API for
Apache Beam and Google Cloud Dataflow that we use heavily at Spotify.

Check out the wiki for more information:
https://github.com/spotify/scio/wiki

Cheers,
Neville

-- Forwarded message -
From: 'Neville Li' via Scio Users 
Date: Tue, Dec 19, 2017 at 1:12 PM
Subject: Scio 0.4.6 released
To: 


Hi all,

We just released Scio 0.4.6. It's based on Beam 2.2.0 now with a lot of
improvements and bug fixes.

Cheers,
Neville

https://github.com/spotify/scio/releases/tag/v0.4.6

*"Galago gallarum"*
Features

   - Upgrade Beam to 2.2.0 #797 <https://github.com/spotify/scio/issues/797>
#958 <https://github.com/spotify/scio/pull/958>
   - Support dynamic file IO destinations #919
   <https://github.com/spotify/scio/issues/919> #965
   <https://github.com/spotify/scio/pull/965>
   - Support custom Kryo options via PipelineOptions #896
   <https://github.com/spotify/scio/issues/896> #955
   <https://github.com/spotify/scio/pull/955>
   - Propagate input to TensorFlow predict output fn
   - Annotate more examples with Socco
   - Support compression in TextIO #972
   <https://github.com/spotify/scio/pull/972>
   - Use Compression in TFRecordIO #977
   <https://github.com/spotify/scio/pull/977>
   - Add TSV examples #974 <https://github.com/spotify/scio/pull/974>

Bug fixes

   - Use window as side input cache key #959
   <https://github.com/spotify/scio/issues/959> #960
   <https://github.com/spotify/scio/pull/960>
   - Use canonical path in macro type providers #975
   <https://github.com/spotify/scio/pull/975>
   - Fix deduplication in SCollection#subtract #973
   <https://github.com/spotify/scio/pull/973>
   - Fix empty RHS for hashJoin and hashLeftJoin #953
   <https://github.com/spotify/scio/pull/953>
   - Fix ClassNotFound issue with ClosureCleaner
   - Lift projection in ParquetAvroFile#flatMap
   - Add Dataflow runner to scio-examples #963
   <https://github.com/spotify/scio/pull/963> #968
   <https://github.com/spotify/scio/pull/968>
   - Remove deprecated Pubsub ClientAuthInterceptor #957
   <https://github.com/spotify/scio/issues/957> #962
   <https://github.com/spotify/scio/pull/962>

-- 
You received this message because you are subscribed to the Google Groups
"Scio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to scio-users+unsubscr...@googlegroups.com.
To post to this group, send email to scio-us...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/scio-users/CA%2BJxquN31ziSQiitOKjK_Ak5PJzxMbqFLPTwJZdZhE9MOHxLUQ%40mail.gmail.com
<https://groups.google.com/d/msgid/scio-users/CA%2BJxquN31ziSQiitOKjK_Ak5PJzxMbqFLPTwJZdZhE9MOHxLUQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
.
For more options, visit https://groups.google.com/d/optout.