Re: Intellij Issue in Imports

2019-12-30 Thread Zohaib Baig
Thank you Kirill and Maximilan for your reply. Solved the issue using
"./gradlew clean" and then refresh the Gradle.

Zohaib

On Sun, Dec 29, 2019 at 6:07 PM Maximilian Michels  wrote:

> I have this issue from time to time when pulling in the latest master.
> I've found that the only way to resolve this is to run "./gradlew clean",
> then refresh the Gradle project inside IntelliJ and run "compileTestJava"
> for the project which had the issues.
>
> Cheers,
> Max
>
> On December 27, 2019 8:20:05 AM GMT+01:00, Zohaib Baig <
> zohaib.b...@venturedive.com> wrote:
>>
>> Hi,
>>
>> According to the documentation, I have setup Beam project from scratch
>> in  IntelliJ. Seems like some files have issues in imports and were not
>> able to build, eventually, I wasn't able to test it through IDE (Working On
>> Windows).
>>
>> Is there any other configuration that I am missing?
>>
>> Thank you.
>>
>> [image: image.png]
>>
>>
>>

-- 

*Muhammad Zohaib Baig*
Senior Software Engineer
Mobile: +92 3443060266
Skype: mzobii.baig




Re: External transform API in Java SDK

2019-12-30 Thread Luke Cwik
On Mon, Dec 23, 2019 at 12:20 PM Heejong Lee  wrote:

>
>
> On Fri, Dec 20, 2019 at 11:38 AM Luke Cwik  wrote:
>
>> What do side inputs look like?
>>
>
> A user needs to first pass PCollections for side inputs into the external
> transform in addition to ordinary input PCollections and define
> PCollectionViews inside the external transform something like:
>
> PCollectionTuple pTuple =
> PCollectionTuple.of("main1", main1)
> .and("main2", main2)
> .and("side", side)
> .apply(External.of(...).withMultiOutputs());
>
> public static class TestTransform extends PTransform PCollectionTuple> {
>   @Override
>   public PCollectionTuple expand(PCollectionTuple input) {
> PCollectionView sideView = 
> input.get("side").apply(View.asSingleton());
> PCollection main =
> PCollectionList.of(input.get("main1"))
> .and(input.get("main2"))
> .apply(Flatten.pCollections())
> .apply(
> ParDo.of(
> new DoFn() {
>   @ProcessElement
>   public void processElement(
>   @Element String x,
>   OutputReceiver out,
>   DoFn.ProcessContext c) {
> out.output(x + c.sideInput(sideView));
>   }
> })
> .withSideInputs(sideView));
>
>
>
>> On Thu, Dec 19, 2019 at 4:39 PM Heejong Lee  wrote:
>>
>>> I wanted to know if anybody has any comment on external transform API
>>> for Java SDK.
>>>
>>> `External.of()` can create external transform for Java SDK. Depending on
>>> input and output types, two additional methods are provided:
>>> `withMultiOutputs()` which specifies the type of PCollection and
>>> `withOutputType()` which specifies the type of output element. Some
>>> examples are:
>>>
>>> PCollection col =
>>> testPipeline
>>> .apply(Create.of("1", "2", "3"))
>>> .apply(External.of(*...*));
>>>
>>> This is okay without additional methods since 1) input and output types
>>> of external transform can be inferred 2) output PCollection is singular.
>>>
>>
>> How does the type/coder at runtime get inferred (doesn't java's type
>> erasure get rid of this information)?
>>
>
>>
>>> PCollectionTuple pTuple =
>>> testPipeline
>>> .apply(Create.of(1, 2, 3, 4, 5, 6))
>>> .apply(
>>> External.of(*...*).withMultiOutputs());
>>>
>>> This requires `withMultiOutputs()` since output PCollection is
>>> PCollectionTuple.
>>>
>>
>> Shouldn't this require a mapping from "output" name to coder/type
>> variable to be specified as an argument to withMultiOutputs?
>>
>>
>>> PCollection pCol =
>>> testPipeline
>>> .apply(Create.of("1", "2", "2", "3", "3", "3"))
>>> .apply(
>>> External.of(...)
>>> .>withOutputType())
>>> .apply(
>>> "toString",
>>> MapElements.into(TypeDescriptors.strings()).via(
>>> x -> String.format("%s->%s", x.getKey(), x.getValue(;
>>>
>>>  This requires `withOutputType()` since the output element type cannot
>>> be inferred from method chaining. I think some users may feel awkward to
>>> call method only with the type parameter and empty parenthesis. Without
>>> `withOutputType()`, the type of output element will be java.lang.Object
>>> which might still be forcefully casted to KV.
>>>
>>
>> How does the output type get preserved in this case (since Java's type
>> erasure would remove > after compilation and coder
>> inference in my opinion should be broken and or choosing something generic
>> like serializable)?
>>
>
> The expansion service is responsible for using cross-language compatible
> coders in the returning expanded transforms and these are the coders used
> in the runtime. Type information annotated by additional methods here is
> for compile-time type safety of external transforms.
>

Note that *.>withOutputType()* could be changed to
*.withOutputType()* and we would get a *PCollection* since
*withOutputType* doesn't actually do anything at runtime and is just to
make types align during compilation.

Is there a way to ensure that the output type is actually compatible with
the coder that was returned after expansion (this would likely require you
to pass in typing information into *withOutputType*, see
TypeDescriptors[1])?

1:
https://github.com/apache/beam/blob/4c18cb4ada2650552a0006dfffd68d0775dd76c6/sdks/java/core/src/main/java/org/apache/beam/sdk/values/TypeDescriptors.java


>>
> Thanks,
>>> Heejong
>>>
>>


Beam Dependency Check Report (2019-12-30)

2019-12-30 Thread Apache Jenkins Server

High Priority Dependency Updates Of Beam Python SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
cachetools
3.1.1
4.0.0
2019-12-23
2019-12-23BEAM-9017
google-cloud-bigquery
1.17.1
1.23.1
2019-09-23
2019-12-23BEAM-5537
google-cloud-datastore
1.7.4
1.10.0
2019-05-27
2019-10-21BEAM-8443
httplib2
0.12.0
0.15.0
2018-12-10
2019-12-23BEAM-9018
mock
2.0.0
3.0.5
2019-05-20
2019-05-20BEAM-7369
oauth2client
3.0.0
4.1.3
2018-12-10
2018-12-10BEAM-6089
pytest
4.6.8
5.3.2
2019-12-23
2019-12-23BEAM-8606
Sphinx
1.8.5
2.3.1
2019-05-20
2019-12-23BEAM-7370
tenacity
5.1.5
6.0.0
2019-11-11
2019-11-11BEAM-8607
High Priority Dependency Updates Of Beam Java SDK:


  Dependency Name
  Current Version
  Latest Version
  Release Date Of the Current Used Version
  Release Date Of The Latest Release
  JIRA Issue
  
com.alibaba:fastjson
1.2.49
1.2.62
2018-08-04
2019-10-07BEAM-8632
com.datastax.cassandra:cassandra-driver-core
3.6.0
4.0.0
2018-08-29
2019-03-18BEAM-8674
com.datastax.cassandra:cassandra-driver-mapping
3.6.0
3.8.0
2018-08-29
2019-10-29BEAM-8749
com.esotericsoftware:kryo
4.0.2
5.0.0-RC4
2018-03-20
2019-04-14BEAM-5809
com.esotericsoftware.kryo:kryo
2.21
2.24.0
2013-02-27
2014-05-04BEAM-5574
com.github.ben-manes.versions:com.github.ben-manes.versions.gradle.plugin
0.20.0
0.27.0
2019-02-11
2019-10-21BEAM-6645
com.github.spotbugs:spotbugs
3.1.12
4.0.0-beta4
2019-03-01
2019-09-18BEAM-7792
com.github.spotbugs:spotbugs-annotations
3.1.12
4.0.0-beta4
2019-03-01
2019-09-18BEAM-6951
com.google.api:gax-grpc
1.38.0
1.52.0
2019-02-05
2019-12-13BEAM-8676
com.google.api.grpc:grpc-google-cloud-datacatalog-v1beta1
0.27.0-alpha
0.30.0-alpha
2019-10-03
2019-11-20BEAM-8853
com.google.api.grpc:grpc-google-cloud-pubsub-v1
1.43.0
1.84.0
2019-01-23
2019-12-04BEAM-8677
com.google.api.grpc:grpc-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-8633
com.google.api.grpc:proto-google-cloud-bigtable-v2
0.44.0
1.8.0
2019-01-23
2019-12-17BEAM-8679
com.google.api.grpc:proto-google-cloud-datacatalog-v1beta1
0.27.0-alpha
0.30.0-alpha
2019-10-03
2019-11-20BEAM-8854
com.google.api.grpc:proto-google-cloud-datastore-v1
0.44.0
0.85.0
2019-01-23
2019-12-05BEAM-8680
com.google.api.grpc:proto-google-cloud-pubsub-v1
1.43.0
1.84.0
2019-01-23
2019-12-04BEAM-8681
com.google.api.grpc:proto-google-cloud-spanner-admin-database-v1
1.6.0
1.47.0
2019-01-23
2019-12-05BEAM-8682
com.google.api.grpc:proto-google-common-protos
1.12.0
1.17.0
2018-06-29
2019-10-04BEAM-6899
com.google.apis:google-api-services-bigquery
v2-rev20181221-1.28.0
v2-rev20190917-1.30.3
2019-01-17
2019-10-09BEAM-8684
com.google.apis:google-api-services-clouddebugger
v2-rev20181114-1.28.0
v2-rev20191003-1.30.3
2019-01-17
2019-10-19BEAM-8750
com.google.apis:google-api-services-cloudresourcemanager
v1-rev20181015-1.28.0
v2-rev20191206-1.30.3
2019-01-17
2019-12-17BEAM-8751
com.google.apis:google-api-services-dataflow
v1b3-rev20190927-1.28.0
v1beta3-rev12-1.20.0
2019-10-11
2015-04-29BEAM-8752
com.google.apis:google-api-services-pubsub
v1-rev20181213-1.28.0
v1-rev20191203-1.30.3
2019-01-18
2019-12-18BEAM-8753
com.google.apis:google-api-services-storage
v1-rev20181109-1.28.0
v1-rev20191011-1.30.3
2019-01-18
2019-10-30BEAM-8754
com.google.auth:google-auth-library-credentials
0.13.0
0.19.0
2019-01-17
2019-12-13BEAM-6478
com.google.auth:google-auth-library-oauth2-http
0.12.0
0.19.0
2018-11-14
2019-12-13BEAM-8685
com.google.cloud:google-cloud-bigquery
1.28.0
   

Re: Request for review of PR [Beam-8564]

2019-12-30 Thread Amogh Tiwari
Hi Luke,

We have gone through shevek/lzo-java, but we chose to go with
airflow/aircompressor for the following reasons:

1) shevek/lzo-java is internally using .jni, .c and .h files, hence the GNU
licence, and that would leave us with only choice of putting this as an
option dependency

2) performance of airlift/aircompressor was much better as compared to
shevek/lzo-java in terms of compression ratio and time taken for
compression/decompression

3) airflow/aircompressor is in pure java and is under Apache licence

Therefore, we'd prefer to go with adding the current implementation as
optional. We'd require your inputs on the same as we are unsure on where we
are supposed to keep the required files and how the final directory
structure would like. We have an idea and we'll update the current PR
accordingly.

Please do guide us on this.


Regards,

Amogh Tiwari

On Wed, Dec 18, 2019 at 4:42 AM Luke Cwik  wrote:

> Sorry for the long delay (was on vacation).
>
> Using org.apache.hadoop isn't part of the Apache Beam Core module but is a
> dependency for those who depend on the Apache Beam Hadoop module. So I
> don't think swapping the com.facebook.presto.hadoop version for the
> org.apache.hadoop version will address Ismael's concern about including
> hadoop dependencies as part of the core.
>
> I looked at shevek/lzo-java[1] and I think its our best choice since it is:
> * pure Java
> * GPLv3 (would require marking the dependency optional and telling users
> to add it explicitly which we have done in the past as well)
> * small (<100kb)
> * small dependency tree (commons-logging & findbugs annotations if we only
> depend on lzo-core)
> * performance (github page claims 500mb/s compression and 800mb/s
> decompression)
>
> Alternatively we can make the LZO compression an extension module (with
> the facebook dependency) and use a registrar to have it loaded dynamically.
>
> 1: https://github.com/shevek/lzo-java
>
> On Fri, Dec 6, 2019 at 5:09 AM Amogh Tiwari  wrote:
>
>> While studying the code, we found that the airlift/ aircompressor library
>> only requires some classes which are also present in apache hadoop common
>> package. Therefore, we are now thinking that if we make changes in the
>> airlift/ aircompressor package, remove the
>> com.facebook.presto.hadoop and use the existing org.apache.hadoop
>>  package which is
>> already included in beam. This will solve both #2 and #3 as the transitive
>> dependency will be removed and the size will also be reduced by almost
>> ~20mbs.
>>
>> But if we use this approach, we will have to manually change the util
>> whenever any changes are made to the airlift library.
>>
>> On Wed, Dec 4, 2019 at 10:13 PM Luke Cwik  wrote:
>>
>>> Going with the Registrar/ServiceLoader route would allow for alternative
>>> providers for the same compression algorithms so if they don't like one
>>> they can always contribute a different one.
>>>
>>> On Wed, Dec 4, 2019 at 8:22 AM Ismaël Mejía  wrote:
>>>
 (1) seems not to be the issue because it is Apache licensed.
 (2) and (3) are the big issues, because it requires a provided huge
 uber jar that essentially leaks Hadoop classes into core SDK [1] so it is
 definitely concerning.

 We discussed at some point during the PR that added ZStandard support
 about creating some sort of Registrar for compression algorithms [2] but we
 decided to not go ahead because we could achieve that for the zstd case via
 the optional dependencies of commons-compress. Maybe it is time to
 reconsider if such mechanism is worth. For example for users that may not
 care about having the hadoop leakage to be able to use LZO.

 Refs.
 [1] https://mvnrepository.com/artifact/io.airlift/aircompressor/0.16
 [2] https://issues.apache.org/jira/browse/BEAM-6422




 On Tue, Dec 3, 2019 at 7:01 PM Robert Bradshaw 
 wrote:

> Is there a way to wrap this up as an optional dependency with multiple
> possible providers, if there's no good library satisfying all of the
> conditions (in particular (1))?
>
> On Tue, Dec 3, 2019 at 9:47 AM Luke Cwik  wrote:
> >
> > I was hoping that someone in the community would provide some
> alternatives since there are quite a few implementations.
> >
> > On Tue, Dec 3, 2019 at 8:20 AM Amogh Tiwari 
> wrote:
> >>
> >> Hi Luke,
> >>
> >> I agree with your thoughts and observations. But,
> airlift:aircompressor is the only implementation of LZO in pure java. That
> straight away solves #5.
> >> The other implementations that I found either have licensing issues
> (since LZO natively uses GNU GPL licence) or are implemented using .c, .h
> and jni (which again make them dependent on the OS). Please refer these:
> twitter/hadoop-lzo and shevek/lzo-java.
> >> These were the main reasons why we based this on
>