Re: [DISCUSS] Drop Spark 1.x support to focus on Spark 2.x

2017-11-26 Thread Jean-Baptiste Onofré

Hi all,

Quick update about the Spark 2.x runner: I updated the PR with Spark 2.x update 
only:


https://github.com/apache/beam/pull/3808

I will rebase and do new tests as soon as gitbox will be back.

Don't hesitate to take a look and review.

Thanks !
Regards
JB

On 11/21/2017 08:32 AM, Jean-Baptiste Onofré wrote:

Hi Tim,

I will update the PR today for a new review round. Yes, you are correct: the 
target is 2.3.0 for end of this year (with announcement in the Release Notes).


Regards
JB

On 11/20/2017 10:09 PM, Tim wrote:

Thanks JB

 From which release will Spark 1.x be dropped please? Is this slated for 2.3.0 
at the end of the year?


Thanks,
Tim,
Sent from my iPhone


On 20 Nov 2017, at 21:21, Jean-Baptiste Onofré  wrote:

Hi,
,
it seems we have a consensus to upgrade to Spark 2.x, dropping Spark 1.x. I 
will upgrade the PR accordingly.


Thanks all for your input and feedback.

Regards
JB


On 11/13/2017 09:32 AM, Jean-Baptiste Onofré wrote:
Hi Beamers,
I'm forwarding this discussion & vote from the dev mailing list to the user 
mailing list.

The goal is to have your feedback as user.
Basically, we have two options:
1. Right now, in the PR, we support both Spark 1.x and 2.x using three 
artifacts (common, spark1, spark2). You, as users, pick up spark1 or spark2 
in your dependencies set depending the Spark target version you want.
2. The other option is to upgrade and focus on Spark 2.x in Beam 2.3.0. If 
you still want to use Spark 1.x, then, you will be stuck up to Beam 2.2.0.

Thoughts ?
Thanks !
Regards
JB
 Forwarded Message 
Subject: [VOTE] Drop Spark 1.x support to focus on Spark 2.x
Date: Wed, 8 Nov 2017 08:27:58 +0100
From: Jean-Baptiste Onofré 
Reply-To: dev@beam.apache.org
To: dev@beam.apache.org
Hi all,
as you might know, we are working on Spark 2.x support in the Spark runner.
I'm working on a PR about that:
https://github.com/apache/beam/pull/3808
Today, we have something working with both Spark 1.x and 2.x from a code 
standpoint, but I have to deal with dependencies. It's the first step of the 
update as I'm still using RDD, the second step would be to support dataframe 
(but for that, I would need PCollection elements with schemas, that's 
another topic on which Eugene, Reuven and I are discussing).
However, as all major distributions now ship Spark 2.x, I don't think it's 
required anymore to support Spark 1.x.
If we agree, I will update and cleanup the PR to only support and focus on 
Spark 2.x.

So, that's why I'm calling for a vote:
   [ ] +1 to drop Spark 1.x support and upgrade to Spark 2.x only
   [ ] 0 (I don't care ;))
   [ ] -1, I would like to still support Spark 1.x, and so having support of 
both Spark 1.x and 2.x (please provide specific comment)
This vote is open for 48 hours (I have the commits ready, just waiting the 
end of the vote to push on the PR).

Thanks !
Regards
JB


--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com




--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


GitBox down

2017-11-26 Thread Jean-Baptiste Onofré

Hi,

it seems GitBox is down (timing out).

I don't see anything on status.apache.org, I will ping INFRA.

Regards
JB
--
Jean-Baptiste Onofré
jbono...@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com


[GitHub] jbonofre commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the Spark runner

2017-11-26 Thread GitBox
jbonofre commented on issue #3808: [BEAM-1920] Add a Spark 2.x support in the 
Spark runner
URL: https://github.com/apache/beam/pull/3808#issuecomment-347089270
 
 
   Following the vote on the mailing lists, I updated the PR with Spark 2.x 
update only (no support of Spark 1 anymore).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058006
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
 
 Review comment:
   Log a warning or raise an error ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057990
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
 
 Review comment:
   Log a warning that proxy_env_var is ignored.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058069
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
+proxy_port=None,
+proxy_user=None,
+proxy_pass=None,
+proxy_rdns=None
+  )
+
+  for proxy_env_var in ['http_proxy', 'HTTP_PROXY', 'https_proxy', 
'HTTPS_PROXY']:
 
 Review comment:
   Why do we have to mention all these values here ? Isn't it enough to use a 
single variable (say, http_proxy) and ask users to always use that ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058033
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
+**kwargs: Arguments to pass to http_class constructor.
+  Returns:
+An initialized httplib2.Http instance.
+  """
+  proxy_info = httplib2.ProxyInfo(
+proxy_type=3,
+proxy_host=None,
 
 Review comment:
   Why do we have to specify 'None' values here instead of leaving default ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058019
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
 
 Review comment:
   Nit: s/Environment/environment


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057981
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
 
 Review comment:
   What should the format of this be ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153057996
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
 
 Review comment:
   Why '_' ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS

2017-11-26 Thread GitBox
chamikaramj commented on a change in pull request #4136: [BEAM-3184] Added 
ProxyInfoFromEnvironmentVar() & GetNewHttp() methods for GCS
URL: https://github.com/apache/beam/pull/4136#discussion_r153058028
 
 

 ##
 File path: sdks/python/apache_beam/io/gcp/gcsio.py
 ##
 @@ -87,6 +87,50 @@
 MAX_BATCH_OPERATION_SIZE = 100
 
 
+def ProxyInfoFromEnvironmentVar(proxy_env_var):
+  """Reads proxy info from the environment and converts to httplib2.ProxyInfo.
+  Args:
+proxy_env_var: Environment variable string to read, such as http_proxy or
+   https_proxy.
+  Returns:
+httplib2.ProxyInfo constructed from the environment string.
+  """
+  proxy_url = os.environ.get(proxy_env_var)
+  if not proxy_url or not proxy_env_var.lower().startswith('http'):
+return httplib2.ProxyInfo(httplib2.socks.PROXY_TYPE_HTTP, None, 0)
+  proxy_protocol = proxy_env_var.lower().split('_')[0]
+  if not proxy_url.lower().startswith('http'):
+# proxy_info_from_url requires a protocol, which is always http or https.
+proxy_url = proxy_protocol + '://' + proxy_url
+  return httplib2.proxy_info_from_url(proxy_url, method=proxy_protocol)
+
+def GetNewHttp(http_class=httplib2.Http, **kwargs):
+  """Creates and returns a new httplib2.Http instance.
+  Args:
+http_class: Optional custom Http class to use.
 
 Review comment:
   Nit: s/Optiona/optional and s/Arguments/arguments


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] luke-zhu opened a new pull request #4176: [BEAM-3143] Type Inference Compatibility with Python 3

2017-11-26 Thread GitBox
luke-zhu opened a new pull request #4176: [BEAM-3143] Type Inference 
Compatibility with Python 3
URL: https://github.com/apache/beam/pull/4176
 
 
   Follow this checklist to help us incorporate your contribution quickly and 
easily:
   
- [ ] Make sure there is a [JIRA 
issue](https://issues.apache.org/jira/projects/BEAM/issues/) filed for the 
change (usually before you start working on it).  Trivial changes like typos do 
not require a JIRA issue.  Your pull request should address just this issue, 
without pulling in other changes.
- [ ] Each commit in the pull request should have a meaningful subject line 
and body.
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in 
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA 
issue.
- [ ] Write a pull request description that is detailed enough to 
understand what the pull request does, how, and why.
- [ ] Run `mvn clean verify` to make sure basic checks pass. A more 
thorough check will be performed on your pull request automatically.
- [ ] If this contribution is large, please file an Apache [Individual 
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   ---
   Builds of Holden's work to get a type inference solution that passes 
precommit tests on Python 2 and type inference unit tests on Python 3.5.
   
   The disassembler code may need more changes if we aim for 3.6+ due to the 
byteword to quadword change. I've ported some code from CPython's lib/dis to 
disassembly.py This should make any future migration to 3.6+ easier.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a ClassCastException in Apex runner

2017-11-26 Thread GitBox
tweise commented on issue #4074: [BEAM-3130] View.asMap() causes a 
ClassCastException in Apex runner
URL: https://github.com/apache/beam/pull/4074#issuecomment-347068452
 
 
   @jkff tests are added


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


Re: [VOTE] Release 2.2.0, release candidate #4

2017-11-26 Thread Reuven Lax
FYI I'm still working on finalizing the release. The issue is that the Beam
documentation underwent a large refactor while the release was pending, so
the website PR (pr/337) conflicts. I think it's safer to give up on
resolving these conflicts, so I'm simply going to regenerate this PR from
scratch.

Reuven

On Sat, Nov 25, 2017 at 11:13 PM, Reuven Lax  wrote:

> Sure, I was mostly surprised it was taking this long. However Robert says
> it sometimes takes three days.
>
> Reuven
>
> On Sat, Nov 25, 2017 at 9:11 PM, Jean-Baptiste Onofré 
> wrote:
>
>> mvnrepository doesn't matter (it will sync later), the actual Central URL
>> is:
>>
>> http://repo.maven.apache.org/maven2/org/apache/beam/beam-sdks-java-core/
>>
>> And 2.2.0 is there.
>>
>> Regards
>> JB
>>
>> On 11/25/2017 08:01 PM, Reuven Lax wrote:
>>
>>> BTW,
>>>
>>> It's been over a day, and I still don't see 2.2.0 listed at
>>> https://mvnrepository.com/artifact/org.apache.beam/beam-sdks-java-core
>>> How
>>> long does it usually take to promote the artifacts here?
>>>
>>> On Fri, Nov 24, 2017 at 3:43 PM, Reuven Lax  wrote:
>>>
>>> Appears to be a problem :)

 I tried publishing the latest artifact from Apache Nexus to Maven
 Central.
 After clicking publish, Nexus claimed that the operation has completed.
 However a look at the Maven Central page (https://mvnrepository.com/
 artifact/org.apache.beam/beam-sdks-java-core) does not show 2.2.0
 artifacts, and the staging repository has now vanished from the Nexus
 site!
 Does anyone know what happened here?

 Reuven

 On Wed, Nov 22, 2017 at 11:04 PM, Thomas Weise  wrote:

 +1
>
> Run quickstart with Apex runner in embedded mode and on YARN.
>
> It needed couple tweaks to get there though.
>
> 1) Change quickstart pom.xml apex-runner profile:
>
>  
>  
>org.apache.hadoop
>hadoop-yarn-client
>${hadoop.version}
>runtime
>  
>  
>org.apache.hadoop
>hadoop-common
>${hadoop.version}
>runtime
>  
>
> 2) After copying the fat jar to the cluster:
>
> java -cp word-count-beam-bundled-0.1.jar
> org.apache.beam.examples.WordC
> ount
> \
>   --inputFile=file:///tmp/input.txt --output=/tmp/counts
> --embeddedExecution=false --configFile=beam-runners-apex.properties
> --runner=ApexRunner
>
> (this was on a single node cluster, hence the local file path)
>
> The quickstart instructions suggest to use *mvn exec:java* instead of
> *java*
> - it generally isn't valid to assume that mvn and a build environment
> exists on the edge node of a YARN cluster.
>
>
>
> On Wed, Nov 22, 2017 at 2:12 PM, Nishu  wrote:
>
> Hi Eugene,
>>
>> I ran it on both  standalone flink(non Yarn) and  Flink on HDInsight
>> Cluster(Yarn). Both ran successfully. :)
>>
>> Regards,
>> Nishu
>>
>> > source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon>
>> Virus-free.
>> www.avast.com
>> > source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link>
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>
>> On Wed, Nov 22, 2017 at 9:40 PM, Eugene Kirpichov <
>> kirpic...@google.com.invalid> wrote:
>>
>> Thanks Nishu. So, if I understand correctly, your pipelines were
>>>
>> running
>
>> on
>>
>>> non-YARN, but you're planning to run with YARN?
>>>
>>> I meanwhile was able to get Flink running on Dataproc (YARN), and
>>>
>> validated
>>
>>> quickstart and game examples.
>>> At this point we need validation for Spark and Flink non-YARN [I
>>>
>> think if
>
>> Nishu's runs were non-YARN, they'd give us enough confidence, combined
>>>
>> with
>>
>>> the success of other validations of Spark and Flink runners?], and
>>>
>> Apex
>
>> on
>>
>>> YARN. However, it seems that in previous RCs we were not validating
>>>
>> Apex
>
>> on
>>
>>> YARN, only local cluster. Is it needed this time?
>>>
>>> On Wed, Nov 22, 2017 at 12:28 PM Nishu  wrote:
>>>
>>> Hi Eugene,

 No, I didn't try with those instead I have my custom pipeline where

>>> Kafka
>>
>>> topic is the source. I have defined a Global Window and processing

>>> time
>
>> trigger to read the data. Further it runs some transformation i.e.
 GroupByKey and CoGroupByKey. on the windowed collections.
 I was running the same pipeline on direct runner and spark runner

>>> earlier..
>>>
 Today gave it a try w

[GitHub] rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous 
classes from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347035725
 
 
   Something like

   Transform Foo$1 conflicts with Foo$2 in pipeline defined in MyTest line 
56. You can fix it adding a name in apply invocations line 54 and 52.
   
   The long name is important cause otherwise you have no clue of the 
provenance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347029760
 
 
   Yeah, I would prefer an improved error message. I suppose you mean the 
message that says `Transform Foo does not have a stable unique name. This will 
prevent reloading of pipelines.` - that message definitely should point to 
specifying the name in .apply() as a fix.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
rmannibucau commented on issue #4172: [BEAM-3243] support multiple anonymous 
classes from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347019445
 
 
   I understand where it comes from but it is very boring when writing tests 
where it is not uncommon to write anonymous classes and the inline naming is 
not always doable when you write utility methods or reusable piece of pipelines.
   
   You would also note that anonymous algorithm with the "number" is somehow 
aligned on the fact to keep only the suffix in the nested class case which 
leads to as meaningless names (Important$Stuff leads to Stuff which is in 
general not very meaningful for the "task" context since it is hold by the 
enclosing class to avoid long and repeating names).
   
   The original issue is when it fails it is quite abstract and not very 
obvious. An alternative can be to enrich the error message with:
   
   1. where the anonymous *classes* (all conflicting ones) are
   2. how to fix it - passing a name to the apply
   
   I would be fine with this "not solution" fix as well, does it sound better 
for you?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes from the same enclosing class in a pipeline

2017-11-26 Thread GitBox
jkff commented on issue #4172: [BEAM-3243] support multiple anonymous classes 
from the same enclosing class in a pipeline
URL: https://github.com/apache/beam/pull/4172#issuecomment-347018533
 
 
   Using multiple anonymous DoFn's with the same enclosing class within the 
same composite transform is already possible if you specify the transform name 
in .apply() - e.g.: .apply("Something", ParDo.of(new DoFn..)).apply("Something 
else", ParDo.of(new DoFn..)). This is a good thing rather than a bug, because 
using generated names like `Enclosing$1` is unstable w.r.t. pipeline update: 
any reordering of the anonymous classes, or adding a new one, or making an 
existing one be non-anonymous, will change the numbering and make the pipeline 
update-incompatible.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services