Write access to Apex/Malhar GH

2017-05-05 Thread Ganelin, Ilya
Hello, all – I was attempting to close out one of the open PRs and it seems I 
may not have write access to the repos on GH. How do I resolve that? Do folks 
typically merge patches through the GH UI or just use the GIT command line?

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Spurious test failures in master

2017-04-28 Thread Ganelin, Ilya
Minor clarification: failures are in apex-malhar:

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: "Ganelin, Ilya" 
Reply-To: "dev@apex.apache.org" 
Date: Friday, April 28, 2017 at 11:18 PM
To: "dev@apex.apache.org" 
Subject: Spurious test failures in master

Jenkins is failing on two PRs with seemingly unrelated errors to the contents 
of the PR. Are the following already tracked somewhere in JIRA?
1)
Error Message

Unable to load AWS credentials from the 
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties
 file
Stacktrace

com.amazonaws.AmazonClientException: Unable to load AWS credentials from the 
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties
 file
Caused by: java.io.FileNotFoundException: File doesn't exist:  
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties


2) 
testApplicationWithPortEndpoint(org.apache.apex.malhar.sql.KafkaEndpointTest)  
Time elapsed: 3.474 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.apex.malhar.sql.KafkaEndpointTest.testApplicationWithPortEndpoint(KafkaEndpointTest.java:131)

https://builds.apache.org/job/Apex_Malhar_PR/org.apache.apex$malhar-library/1118/testReport/com.datatorrent.lib.io.jms/SQSStringInputOperatorTest/


- Ilya Ganelin
[cid:image001.png@01D2C075.CBE3B450]



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Spurious test failures in master

2017-04-28 Thread Ganelin, Ilya
Jenkins is failing on two PRs with seemingly unrelated errors to the contents 
of the PR. Are the following already tracked somewhere in JIRA?
1)
Error Message

Unable to load AWS credentials from the 
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties
 file
Stacktrace

com.amazonaws.AmazonClientException: Unable to load AWS credentials from the 
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties
 file
Caused by: java.io.FileNotFoundException: File doesn't exist:  
/home/jenkins/jenkins-slave/workspace/Apex_Malhar_PR%402/library/target/test-classes/sqstestCreds.properties


2) 
testApplicationWithPortEndpoint(org.apache.apex.malhar.sql.KafkaEndpointTest)  
Time elapsed: 3.474 sec  <<< FAILURE!
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.apex.malhar.sql.KafkaEndpointTest.testApplicationWithPortEndpoint(KafkaEndpointTest.java:131)



- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Dusting off old PRs

2017-04-28 Thread Ganelin, Ilya
Hi all – have been reviewing some long stagnant code now that I’ve had a moment 
to breathe and return to OSS ☺

Would be much obliged if folks could take a glance at the following PRs, both 
of which should be in a near ready state, complete with tests.

https://github.com/apache/apex-malhar/pull/309 (Support for easily merging 
multiple streams)
https://github.com/apache/apex-malhar/pull/315 (Support for a data cloning 
partitioner)

Also have a new PR open here.
https://github.com/apache/apex-malhar/pull/615 (Adds support for Snappy 
compression)

Thanks in advance!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: PR merge policy

2017-04-28 Thread Ganelin, Ilya
Guess we can all go home then. Our work here is done:

[cid:image001.png@01D2C024.5F2A5090]

W.R.T the discussion below, I think rolling back an improperly reviewed PR 
could be considered disrespectful to the committer who merged it in the first 
place. I think that such situations, unless they trigger a disaster, should be 
handled by communicating the error to the responsible party and then allowing 
them to resolve it. E.g. I improperly commit an unreviewed PR, someone notices 
and sends me an email informing me of my error, and I then have the 
responsibility of unrolling the change and getting the appropriate review. I 
think we should start with the premise that we’re here in the spirit of 
collaboration and we should create opportunities for individuals to learn from 
their mistakes, recognize the importance of particular standards (e.g. good 
review process leads to stable projects), and ultimately internalize these 
ethics.

Internally to our team, we’ve had great success with a policy requiring two PR 
approvals and not allowing the creator of a patch to be the one to merge their 
PR. While this might feel a little silly, it definitely helps to build 
collaboration, familiarity with the code base, and intrinsically avoids PRs 
being merged too quickly (without a sufficient period for review).


- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: Pramod Immaneni 
Reply-To: "dev@apex.apache.org" 
Date: Friday, April 28, 2017 at 10:09 AM
To: "dev@apex.apache.org" 
Subject: Re: PR merge policy

On a lighter note, looks like the powers that be have been listening on this 
conversation and decided to force push an empty repo or maybe github just 
decided that this is the best proposal ;)



On Thu, Apr 27, 2017 at 10:47 PM, Vlad Rozov 
mailto:v.ro...@datatorrent.com>> wrote:
In this case please propose how to deal with PR merge policy violations in the 
future. I will -1 proposal to commit an improvement on top of a commit.

Thank you,

Vlad


On 4/27/17 21:48, Pramod Immaneni wrote:
I am sorry but I am -1 on the force push in this case.
On Apr 27, 2017, at 9:27 PM, Thomas Weise 
mailto:t...@apache.org>> wrote:

+1 as measure of last resort.
On Thu, Apr 27, 2017 at 9:25 PM, Vlad Rozov 
mailto:v.ro...@datatorrent.com>> wrote:

IMO, force push will bring enough consequent embarrassment to avoid such
behavior in the future.

Thank you,

Vlad
On 4/27/17 21:16, Munagala Ramanath wrote:

My thought was that leaving the bad commit would be a permanent reminder
to
the committer
(and others) that a policy violation occurred and the consequent
embarrassment would be an
adequate deterrent.

Ram

On Thu, Apr 27, 2017 at 9:12 PM, Vlad Rozov 
mailto:v.ro...@datatorrent.com>>
wrote:

I also was under impression that everyone agreed to the policy that gives
everyone in the community a chance to raise a concern or to propose an
improvement to a PR. Unfortunately, it is not the case, and we need to
discuss it again. I hope that this discussion will lead to no future
violations so we don't need to forcibly undo such commits, but it will be
good for the community to agree on the policy that deals with violations.

Ram, committing an improvement on top of a commit should be discouraged,
not encouraged as it eventually leads to the policy violation and lousy
PR
reviews.

Thank you,

Vlad

On 4/27/17 20:54, Thomas Weise wrote:

I also thought that everybody was in agreement about that after the first
round of discussion and as you say it would be hard to argue against it.
And I think we should not have to be back to the same topic a few days
later.

While you seem to be focussed on the disagreement on policy violation,
I'm
more interested in a style of collaboration that does not require such
discussion.

Thomas

On Thu, Apr 27, 2017 at 8:45 PM, Munagala Ramanath 
mailto:r...@datatorrent.com>
wrote:

Everybody seems agreed on what the committers should do -- that waiting
a
day or two for
others to have a chance to comment seems like an entirely reasonable
thing.

The disagreement is about what to do when that policy is violated.

Ram

On Thu, Apr 27, 2017 at 8:37 PM, Thomas Weise 
mailto:t...@apache.org>> wrote:

Forced push should not be necessary if committers follow the
development
process.

So why not focus on what the committer should do? Development process
and
guidelines are there for a reason, and the issue was raised before.

In this specific case, there is a string of commits related to the
plugin
feature that IMO should be part of the original review. There wasn't
any
need to rush this, the change wasn't important for the release.

Thomas


On Thu, Apr 27, 2017 at 8:11 PM, Munagala Ramanath <
r...@datatorrent.com
wrote:

I agree with Pramod that force pushing should be a rare event done
only
when there is an immediate
need to undo something serious. Doing it just for a policy violation

should
itself be codified in our
policies as a policy violation.

Why not just commit an impr

Re: Why does emit require current thread to be the operator thread?

2017-04-10 Thread Ganelin, Ilya
Neat – idle would be when not in process, begin window, or end window? Is that 
its own event loop or is it a periodic callback?

- Ilya Ganelin


On 4/10/17, 1:58 PM, "Pramod Immaneni"  wrote:

You can also emit when the system is idling by implementing the
IdleTimeHandler interface in your operator.

On Mon, Apr 10, 2017 at 1:06 PM, Amol Kekre  wrote:

> Not yet, But we could leverage internal structures of Apex as they do same
> thing. For example in container local streams. There is a catch though -
> the queue read by main thread will only happen when another data tuple
> arrives in process call, or control tuple arrives for start or end window.
>
> Thks
> Amol
>
>
>
> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
> www.datatorrent.com
    >
    >
> On Mon, Apr 10, 2017 at 1:01 PM, Ganelin, Ilya <
> ilya.gane...@capitalone.com>
> wrote:
>
> > Thanks, Amol – that makes sense and was the solution I’d arrived at. I
> > just was trying to avoid the delay between the data being ready and
> > emitting it. Has anyone built a solution where it emits from the parent
> as
> > soon as it’s ready in the child (assuming I don’t care about order).
> >
> > - Ilya Ganelin
> >
> >
> > On 4/10/17, 12:45 PM, "Amol Kekre"  wrote:
> >
> > Ilya,
> > This constraint was introduced as allowing two threads to emit data
> > creates
> > lots of bad situations
> > 1. The emit is triggered between end_window and begin_window. This
> was
> > a
> > critical blocker
> > 2. Order no longer guaranteed, upon replay getting wrong order of
> > events
> > within a window. This was something to worry about, but not a 
blocker
> >
> > We had users report this problem.
> >
> > The solution is to pass the data to main thread and have the main
> > thread
> > emit this data during one of start-window, process, end-window 
calls.
> > Ideally during start-window or end-window so as to guarantee order.
> > Keeping
> > this code in start or end window also ensures that process call
> remains
    > > optimal.
> >
> > Thks
> > Amol
> >
> >
> >
> > E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
> >
> > www.datatorrent.com
> >
> >
> > On Mon, Apr 10, 2017 at 12:39 PM, Ganelin, Ilya <
> > ilya.gane...@capitalone.com
> > > wrote:
> >
> > > Hello – I’ve got an operator that runs a cleanup thread (separate
> > from the
> > > main event loop) and triggers a callback when an item is removed
> > from an
> > > internal data structure. I would like for this callback to emit
> data
> > from
> > > one of the operator’s ports, but I run into the following
> Exception:
> > >
> > >
> > >
> > > (From DefaultOutputPort.java, line 58)
> > >
> > > if (operatorThread != null && Thread.*currentThread*() !=
> > operatorThread)
> > > {
> > >   // only under certain modes: enforce this
> > >   throw new IllegalStateException("Current thread " + Thread.
> > > *currentThread*().getName() +
> > >   " is different from the operator thread " +
> > > operatorThread.getName());
> > > }
> > >
> > >
> > >
> > > I could obviously extend DefaultOperatorPort to bypass this but 
I’d
> > like
> > > to understand why that constraint is there and if there’s a good
> way
> > to
> > > work around it.
> > >
> > >
> > >
> > > Would love to hear the community’s thoughts. Thanks!
> > >
> > >
> > >
> > > - Ilya Ganelin
> > >
> > > [image: id:image001.png@01D1F7A4.F3D42980]
> > >
> > > --
> > >
> > > The information contained in this e-mail is confidential and/or
> > > propriet

Re: Why does emit require current thread to be the operator thread?

2017-04-10 Thread Ganelin, Ilya
Thanks, Amol – that makes sense and was the solution I’d arrived at. I just was 
trying to avoid the delay between the data being ready and emitting it. Has 
anyone built a solution where it emits from the parent as soon as it’s ready in 
the child (assuming I don’t care about order). 

- Ilya Ganelin


On 4/10/17, 12:45 PM, "Amol Kekre"  wrote:

Ilya,
This constraint was introduced as allowing two threads to emit data creates
lots of bad situations
1. The emit is triggered between end_window and begin_window. This was a
critical blocker
2. Order no longer guaranteed, upon replay getting wrong order of events
within a window. This was something to worry about, but not a blocker

We had users report this problem.

The solution is to pass the data to main thread and have the main thread
emit this data during one of start-window, process, end-window calls.
Ideally during start-window or end-window so as to guarantee order. Keeping
this code in start or end window also ensures that process call remains
optimal.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com


On Mon, Apr 10, 2017 at 12:39 PM, Ganelin, Ilya  wrote:

> Hello – I’ve got an operator that runs a cleanup thread (separate from the
> main event loop) and triggers a callback when an item is removed from an
> internal data structure. I would like for this callback to emit data from
> one of the operator’s ports, but I run into the following Exception:
>
>
>
> (From DefaultOutputPort.java, line 58)
>
> if (operatorThread != null && Thread.*currentThread*() != operatorThread)
> {
>   // only under certain modes: enforce this
>   throw new IllegalStateException("Current thread " + Thread.
> *currentThread*().getName() +
>   " is different from the operator thread " +
> operatorThread.getName());
> }
>
>
>
> I could obviously extend DefaultOperatorPort to bypass this but I’d like
> to understand why that constraint is there and if there’s a good way to
> work around it.
>
>
>
> Would love to hear the community’s thoughts. Thanks!
>
>
>
> - Ilya Ganelin
>
> [image: id:image001.png@01D1F7A4.F3D42980]
>
> --
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Why does emit require current thread to be the operator thread?

2017-04-10 Thread Ganelin, Ilya
Hello – I’ve got an operator that runs a cleanup thread (separate from the main 
event loop) and triggers a callback when an item is removed from an internal 
data structure. I would like for this callback to emit data from one of the 
operator’s ports, but I run into the following Exception:

(From DefaultOutputPort.java, line 58)
if (operatorThread != null && Thread.currentThread() != operatorThread) {
  // only under certain modes: enforce this
  throw new IllegalStateException("Current thread " + 
Thread.currentThread().getName() +
  " is different from the operator thread " + operatorThread.getName());
}

I could obviously extend DefaultOperatorPort to bypass this but I’d like to 
understand why that constraint is there and if there’s a good way to work 
around it.

Would love to hear the community’s thoughts. Thanks!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: [GitHub] apex-malhar pull request #587: APEXMALHAR-2453 Sort Accumulation for Windowe...

2017-03-21 Thread Ganelin, Ilya
To unsubscribe please follow the links here:
https://apex.apache.org/community.html

- Ilya Ganelin


On 3/21/17, 1:29 PM, "Herger, Brendan"  wrote:

unsubscribe

Thanks,
Brendan Herger
Machine Learning Engineer
Center for Machine Learning @ Capital One
415.582.7457 (cell)
 

On 3/21/17, 9:36 AM, "ajaygit158"  wrote:

GitHub user ajaygit158 opened a pull request:

https://github.com/apache/apex-malhar/pull/587

APEXMALHAR-2453 Sort Accumulation for Windowed operators



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ajaygit158/apex-malhar APEXMALHAR-2453

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/apex-malhar/pull/587.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #587


commit c63a8fd1766e81abfa3b018a98b345c2b8fe17df
Author: ajaygit158 
Date:   2017-03-21T11:50:19Z

APEXMALHAR-2453 Added Sort Accumulation for Windowed operator

commit 7b1cc6c8d0119ceff1d00530964c6589b3e92e55
Author: ajaygit158 
Date:   2017-03-21T16:08:16Z

APEXMALHAR-2453 Sort Accumulation for Windowed operators




---
If your project is set up for it, you can reply to this email and have 
your
reply appear on GitHub as well. If your project does not have this 
feature
enabled and wishes so, or if the feature is enabled but not working, 
please
contact infrastructure at infrastruct...@apache.org or file a JIRA 
ticket
with INFRA.
---




The information contained in this e-mail is confidential and/or proprietary 
to Capital One and/or its affiliates and may only be used solely in performance 
of work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: NO_LOCAL_WRITE Error from Stram

2017-03-03 Thread Ganelin, Ilya
Minor amendment: hadoop-2.6.0+cdh5.8.0+1592 (2.6 vs 2.7)


- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: "Ganelin, Ilya" 
Reply-To: "dev@apex.apache.org" 
Date: Friday, March 3, 2017 at 1:59 PM
To: "dev@apex.apache.org" 
Subject: NO_LOCAL_WRITE Error from Stram

Hello, all – new error cropping up at Application start and Google does not 
offer any helpful guidance.

2017-03-03 16:49:02,558 INFO com.datatorrent.common.util.AsyncFSStorageAgent: 
using 
/opt/cloudera/hadoop/1/yarn/nm/usercache/XX/appcache/application_1483979920683_0448/container_e97905_1483979920683_0448_01_01/tmp/chkp3008940656673217935
 as the basepath for checkpointing.

2017-03-03 16:49:02,739 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting 
Application Master
2017-03-03 16:49:02,739 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting 
Application Master
java.lang.NoSuchFieldError: NO_LOCAL_WRITE
at 
org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1909)
at 
org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1938)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1996)
at 
org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1786)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:105)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:59)
at 
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:680)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:676)
at 
org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:676)
at 
com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:118)
at 
com.datatorrent.stram.plan.physical.PhysicalPlan.initCheckpoint(PhysicalPlan.java:1236)
at 
com.datatorrent.stram.plan.physical.PhysicalPlan.(PhysicalPlan.java:495)
at 
com.datatorrent.stram.StreamingContainerManager.(StreamingContainerManager.java:418)
at 
com.datatorrent.stram.StreamingContainerManager.getInstance(StreamingContainerManager.java:3065)
at 
com.datatorrent.stram.StreamingAppMasterService.serviceInit(StreamingAppMasterService.java:552)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:102)

Hadoop version is 2.7.2, Apex-core 3.5.0, Gateway 3.7.0, Java 1.8

Any thoughts? Thanks!

- Ilya Ganelin
[cid:image001.png@01D29426.54B96530]



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


NO_LOCAL_WRITE Error from Stram

2017-03-03 Thread Ganelin, Ilya
Hello, all – new error cropping up at Application start and Google does not 
offer any helpful guidance.

2017-03-03 16:49:02,558 INFO com.datatorrent.common.util.AsyncFSStorageAgent: 
using 
/opt/cloudera/hadoop/1/yarn/nm/usercache/XX/appcache/application_1483979920683_0448/container_e97905_1483979920683_0448_01_01/tmp/chkp3008940656673217935
 as the basepath for checkpointing.

2017-03-03 16:49:02,739 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting 
Application Master
2017-03-03 16:49:02,739 ERROR com.datatorrent.stram.StreamingAppMaster: Exiting 
Application Master
java.lang.NoSuchFieldError: NO_LOCAL_WRITE
at 
org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1909)
at 
org.apache.hadoop.hdfs.DFSOutputStream.(DFSOutputStream.java:1938)
at 
org.apache.hadoop.hdfs.DFSOutputStream.newStreamForCreate(DFSOutputStream.java:1996)
at 
org.apache.hadoop.hdfs.DFSClient.primitiveCreate(DFSClient.java:1786)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:105)
at org.apache.hadoop.fs.Hdfs.createInternal(Hdfs.java:59)
at 
org.apache.hadoop.fs.AbstractFileSystem.create(AbstractFileSystem.java:577)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:680)
at org.apache.hadoop.fs.FileContext$3.next(FileContext.java:676)
at 
org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
at org.apache.hadoop.fs.FileContext.create(FileContext.java:676)
at 
com.datatorrent.common.util.AsyncFSStorageAgent.copyToHDFS(AsyncFSStorageAgent.java:118)
at 
com.datatorrent.stram.plan.physical.PhysicalPlan.initCheckpoint(PhysicalPlan.java:1236)
at 
com.datatorrent.stram.plan.physical.PhysicalPlan.(PhysicalPlan.java:495)
at 
com.datatorrent.stram.StreamingContainerManager.(StreamingContainerManager.java:418)
at 
com.datatorrent.stram.StreamingContainerManager.getInstance(StreamingContainerManager.java:3065)
at 
com.datatorrent.stram.StreamingAppMasterService.serviceInit(StreamingAppMasterService.java:552)
at 
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at 
com.datatorrent.stram.StreamingAppMaster.main(StreamingAppMaster.java:102)

Hadoop version is 2.7.2, Apex-core 3.5.0, Gateway 3.7.0, Java 1.8

Any thoughts? Thanks!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: FailoverProxyProvider Error on Launch

2017-03-03 Thread Ganelin, Ilya
Thanks, that resolved that issue

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]

From: Vlad Rozov 
Organization: DataTorrent
Reply-To: "dev@apex.apache.org" 
Date: Friday, March 3, 2017 at 1:37 PM
To: "dev@apex.apache.org" 
Subject: Re: FailoverProxyProvider Error on Launch

Please check that your application package does not include hadoop jars.
Thank you,

Vlad

Join us at Apex Big Data World-San 
Jose<http://www.apexbigdata.com/san-jose.html>, April 4, 2017
[ttp://www.apexbigdata.com/san-jose-register.html]<http://www.apexbigdata.com/san-jose-register.html>
On 3/3/17 11:37, Ganelin, Ilya wrote:
Hi, all – my application submits to the gateway, is accepted, but then fails 
with:

017-03-03 14:30:50,505 INFO org.apache.hadoop.service.AbstractService: Service 
com.datatorrent.stram.StreamingAppMasterService failed in state INITED; cause: 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:155)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:240)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:332)
….

Caused by: java.lang.AbstractMethodError: 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy()Ljava/lang/Object;
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:181)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:708)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:651)
at org.apache.hadoop.fs.Hdfs.(Hdfs.java:90)
... 47 more

This seems related to 
http://stackoverflow.com/questions/38262064/exception-while-running-spark-submit-on-hadoop-cluster-with-highavailability

But I don’t know how this applies in the current situation. Wouldn’t DT 
retrieve the core-site/hdfs-site from the Hadoop path we provide when setting 
up the gateway?

Our cluster is now deployed as an HA cluster, which wasn’t the case the last 
time we ran Apex.

Any ideas on how to resolve this would be much appreciated. Thanks!

- Ilya Ganelin
[cid:image001.png@01D29412.9BE935C0]



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


FailoverProxyProvider Error on Launch

2017-03-03 Thread Ganelin, Ilya
Hi, all – my application submits to the gateway, is accepted, but then fails 
with:

017-03-03 14:30:50,505 INFO org.apache.hadoop.service.AbstractService: Service 
com.datatorrent.stram.StreamingAppMasterService failed in state INITED; cause: 
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.fs.AbstractFileSystem.newInstance(AbstractFileSystem.java:129)
at 
org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:155)
at 
org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:240)
at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:332)
….

Caused by: java.lang.AbstractMethodError: 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider.getProxy()Ljava/lang/Object;
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:73)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.(RetryInvocationHandler.java:64)
at org.apache.hadoop.io.retry.RetryProxy.create(RetryProxy.java:58)
at 
org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:181)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:708)
at org.apache.hadoop.hdfs.DFSClient.(DFSClient.java:651)
at org.apache.hadoop.fs.Hdfs.(Hdfs.java:90)
... 47 more

This seems related to 
http://stackoverflow.com/questions/38262064/exception-while-running-spark-submit-on-hadoop-cluster-with-highavailability

But I don’t know how this applies in the current situation. Wouldn’t DT 
retrieve the core-site/hdfs-site from the Hadoop path we provide when setting 
up the gateway?

Our cluster is now deployed as an HA cluster, which wasn’t the case the last 
time we ran Apex.

Any ideas on how to resolve this would be much appreciated. Thanks!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Local mode execution

2017-03-03 Thread Ganelin, Ilya
Apologies – I missed his response due to an email filter. Found it now, thanks!

- Ilya Ganelin


On 3/3/17, 9:19 AM, "Munagala Ramanath"  wrote:

There was a response from Thomas 2 days ago which also included a code
fragment.

Let me know if you are unable to locate it and I'll copy it here.

Ram

On Fri, Mar 3, 2017 at 9:10 AM, Ganelin, Ilya 
wrote:

> Hey all - any word on this? Would love to be able to test locally using
> the new framework. Thanks!
>
>
>
>
> ________
> From: Ganelin, Ilya 
> Sent: Wednesday, March 1, 2017 12:23:06 PM
> To: dev@apex.apache.org
> Subject: Local mode execution
>
> Hi all, I’ve returned to writing Apex apps after a hiatus, it seems that
> the LocalMode is now deprecated, having been replaced by the Launcher
> interface. Is there an example or documentation anywhere of using this new
> API?
>
> Please let me know, thanks!
>
> - Ilya Ganelin
> [id:image001.png@01D1F7A4.F3D42980]
>
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
> 
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the 
intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>



-- 

___

Munagala V. Ramanath

Software Engineer

E: r...@datatorrent.com | M: (408) 331-5034 | Twitter: @UnknownRam

www.datatorrent.com  |  apex.apache.org




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


RE: Local mode execution

2017-03-03 Thread Ganelin, Ilya
Hey all - any word on this? Would love to be able to test locally using the new 
framework. Thanks!





From: Ganelin, Ilya 
Sent: Wednesday, March 1, 2017 12:23:06 PM
To: dev@apex.apache.org
Subject: Local mode execution

Hi all, I’ve returned to writing Apex apps after a hiatus, it seems that the 
LocalMode is now deprecated, having been replaced by the Launcher interface. Is 
there an example or documentation anywhere of using this new API?

Please let me know, thanks!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


RE: Operator Node Affinity

2017-03-02 Thread Ganelin, Ilya
Thanks, Amol / that's the solution in a nutshell. I think static ports are fine 
for our environment so the simpler solution is preferable. I'm hoping to roll 
out the app tomorrow to give it a shot. Will keep you posted!





From: Amol Kekre 
Sent: Thursday, March 2, 2017 5:15:42 PM
To: dev@apex.apache.org
Subject: Re: Operator Node Affinity

Ilya,
Put all nodes on the node-balancer list, and only the ones that get the
operator-jvm will respond to load-balancer's status url. One place where
you have to tweek "do not depend on host/port of distributed OS" is the
port number. I believe the load-balancer will use is fixed. You could use a
proxy that periodically figures out the port,host and redirects, but then
you have an extra hardware hop in between (uptime issue?) that negates the
load-balancer play a little. You could do two-proxy servers solution.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com<http://www.datatorrent.com>  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Thu, Mar 2, 2017 at 8:59 AM, Ganelin, Ilya 
wrote:

> Thanks – the solution I’m leaning towards is to deploy a load balancer
> with a list of the nodes in the cluster, once Apex spins up, the load
> balancer should be able to establish connections to the deployed operators
> and route data appropriately.
>
> - Ilya Ganelin
>
>
> On 3/2/17, 8:34 AM, "Amol Kekre"  wrote:
>
> Ilya,
> As Thomas says, attaching a JVM to an operator is do-able, but is
> against
> the norm in a distributed cluster. A distributed OS cannot guarantee a
> node. It could be down or not have resources,  ZK way or any other
> way
> to discover post deployment is the way to go. I think a webservice call
> through Stram to get the specifics will work too.
>
> Thks
> Amol
>
>
>
> E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*
>
> www.datatorrent.com<http://www.datatorrent.com>  |  apex.apache.org
>
> *Join us at Apex Big Data World-San Jose
> <http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
> [image: http://www.apexbigdata.com/san-jose-register.html]
> <http://www.apexbigdata.com/san-jose-register.html>
>
> On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise  wrote:
>
> > If I understand it correctly you want to run a server in an operator,
> > discover its endpoint and push data to it? The preferred way of
> doing that
> > would be to announce the endpoint through a discovery mechanism
> (such as
> > ZooKeeper or a shared file) that the upstream entity can use to find
> the
> > endpoint.
> >
> > If you are looking for a way to force deploy on a specific node,
> then have
> > a look at the OperatorContext.LOCALITY_HOST attribute (and also
> > AffinityRulesTest). AFAIK you can use a specific host name and the
> > scheduler will make best effort to get a container on that host, but
> there
>     > isn't a guarantee. Generally, services running on the cluster
> shouldn't
> > make assumptions about hosts and ports and use discovery instead.
> >
> > HTH,
> > Thomas
> >
> >
> > On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya <
> ilya.gane...@capitalone.com
> > >
> > wrote:
> >
> > > Hello, all – is there any way to deploy a given operator to a
> specific
> > > Node? E.g. if I’m trying to create a listener for a TCP socket
> that can
> > > then pipe data to a DAG, is there any way for the location of that
> > listener
> > > to be deterministic so an upstream entity knows what to connect to?
> > >
> > >
> > >
> > > - Ilya Ganelin
> > >
> > > [image: id:image001.png@01D1F7A4.F3D42980]
> > >
> > > --
> > >
> > > The information contained in this e-mail is confidential and/or
> > > proprietary to Capital One and/or its affiliates and may only be
> used
> > > solely in performance of work or services for Capital One. The
> > information
> > > transmitted herewith is intended only for use by the individual or
> entity
> > > to which it is addressed. If the reader of this message is not the
> > intended
> > 

Re: Operator Node Affinity

2017-03-02 Thread Ganelin, Ilya
Thanks – the solution I’m leaning towards is to deploy a load balancer with a 
list of the nodes in the cluster, once Apex spins up, the load balancer should 
be able to establish connections to the deployed operators and route data 
appropriately. 

- Ilya Ganelin


On 3/2/17, 8:34 AM, "Amol Kekre"  wrote:

Ilya,
As Thomas says, attaching a JVM to an operator is do-able, but is against
the norm in a distributed cluster. A distributed OS cannot guarantee a
node. It could be down or not have resources,  ZK way or any other way
to discover post deployment is the way to go. I think a webservice call
through Stram to get the specifics will work too.

Thks
Amol



E:a...@datatorrent.com | M: 510-449-2606 | Twitter: @*amolhkekre*

www.datatorrent.com  |  apex.apache.org

*Join us at Apex Big Data World-San Jose
<http://www.apexbigdata.com/san-jose.html>, April 4, 2017!*
[image: http://www.apexbigdata.com/san-jose-register.html]
<http://www.apexbigdata.com/san-jose-register.html>

On Wed, Mar 1, 2017 at 8:16 PM, Thomas Weise  wrote:

> If I understand it correctly you want to run a server in an operator,
> discover its endpoint and push data to it? The preferred way of doing that
> would be to announce the endpoint through a discovery mechanism (such as
> ZooKeeper or a shared file) that the upstream entity can use to find the
> endpoint.
>
> If you are looking for a way to force deploy on a specific node, then have
> a look at the OperatorContext.LOCALITY_HOST attribute (and also
> AffinityRulesTest). AFAIK you can use a specific host name and the
> scheduler will make best effort to get a container on that host, but there
> isn't a guarantee. Generally, services running on the cluster shouldn't
> make assumptions about hosts and ports and use discovery instead.
>
    > HTH,
    > Thomas
>
>
> On Wed, Mar 1, 2017 at 7:53 PM, Ganelin, Ilya  >
> wrote:
>
> > Hello, all – is there any way to deploy a given operator to a specific
> > Node? E.g. if I’m trying to create a listener for a TCP socket that can
> > then pipe data to a DAG, is there any way for the location of that
> listener
> > to be deterministic so an upstream entity knows what to connect to?
> >
> >
> >
> > - Ilya Ganelin
> >
> > [image: id:image001.png@01D1F7A4.F3D42980]
> >
> > --
> >
> > The information contained in this e-mail is confidential and/or
> > proprietary to Capital One and/or its affiliates and may only be used
> > solely in performance of work or services for Capital One. The
> information
> > transmitted herewith is intended only for use by the individual or 
entity
> > to which it is addressed. If the reader of this message is not the
> intended
> > recipient, you are hereby notified that any review, retransmission,
> > dissemination, distribution, copying or other use of, or taking of any
> > action in reliance upon this information is strictly prohibited. If you
> > have received this communication in error, please contact the sender and
> > delete the material from your computer.
> >
>




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Operator Node Affinity

2017-03-01 Thread Ganelin, Ilya
Hello, all – is there any way to deploy a given operator to a specific Node? 
E.g. if I’m trying to create a listener for a TCP socket that can then pipe 
data to a DAG, is there any way for the location of that listener to be 
deterministic so an upstream entity knows what to connect to?

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Local mode execution

2017-03-01 Thread Ganelin, Ilya
Hi all, I’ve returned to writing Apex apps after a hiatus, it seems that the 
LocalMode is now deprecated, having been replaced by the Launcher interface. Is 
there an example or documentation anywhere of using this new API?

Please let me know, thanks!

- Ilya Ganelin
[id:image001.png@01D1F7A4.F3D42980]


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Running application built against custom Malhar version

2016-07-11 Thread Ganelin, Ilya
I’m attempting to build a new application leveraging a custom Malhar library. 
I’ve compiled the updated Malhar library, confirmed that the classes I’ve added 
are present, and added this specific library as a system dependency in my pom 
of the newly created application:


  org.apache.apex
  malhar-library
  3.5.0-SNAPSHOT
  system
  
code/apex-malhar/library/target/malhar-library-3.5.0-SNAPSHOT.jar


However, when I then build this new application and upload the generated APA to 
the cluster and attempt to run it with the DataTorrent UI, I get a 
NoClassDefFoundError. I would appreciate some help understanding what I’m doing 
incorrectly – I believe the dependency on the newly added class should be 
provided since the APA is an UberJar. Am I missing something obvious here?

An error occurred trying to launch the application. Server message: 
java.lang.NoClassDefFoundError: com/datatorrent/lib/stream/MultipleStreamMerger 
at com.example.merger.Application.populateDAG(Application.java:41) at 
com.datatorrent.stram.plan.logical.LogicalPlanConfiguration.prepareDAG(LogicalPlanConfiguration.java:1171)
 at 
com.datatorrent.stram.client.StramAppLauncher$1.createApp(StramAppLauncher.java:404)
 at 
com.datatorrent.stram.client.StramAppLauncher.launchApp(StramAppLauncher.java:479)
 at com.datatorrent.stram.cli.DTCli$LaunchCommand.execute(DTCli.java:2048) at 
com.datatorrent.stram.cli.DTCli.launchAppPackage(DTCli.java:3452) at 
com.datatorrent.stram.cli.DTCli.access$7000(DTCli.java:104) at 
com.datatorrent.stram.cli.DTCli$LaunchCommand.execute(DTCli.java:1893) at 
com.datatorrent.stram.cli.DTCli$3.run(DTCli.java:1447) Caused by: 
java.lang.ClassNotFoundException: 
com.datatorrent.lib.stream.MultipleStreamMerger at 
java.net.URLClassLoader$1.run(URLClassLoader.java:366) at 
java.net.URLClassLoader$1.run(URLClassLoader.java:355) at 
java.security.AccessController.doPrivileged(Native Method) at 
java.net.URLClassLoader.findClass(URLClassLoader.java:354) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:425) at 
java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789) at 
java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 9 more Fatal error 
encountered





The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Proposed update to partitioning logic

2016-05-27 Thread Ganelin, Ilya
Hello all, as part of investigation 
https://issues.apache.org/jira/browse/APEXCORE-332 I dove pretty deep into the 
present implementation of the Partitioner, and the classes that leverage it 
including the PartitionAwareSink and DefaultPartition.

Given that this is a rather core utility of the Apex code as well as something 
that external implements interface with, I fully understand both the difficulty 
and trepidation around making changes. I would particularly love feedback from 
folks most familiar with it.

Current Implementation (my understanding)
The current approach to partitioning data is to define two objects: 1) A set of 
partition keys and 2) an integer mask. The basic idea is that there is an 
overarching set of partition keys, only some of which may be valid for a given 
port. Within the PartitionAwareSink which handles routing of the data, the 
class applies this mask to a “code” generated by an individual piece of data 
via an & operation and checks whether the set of partitions associated with a 
particular port will accept this data: 
partitions.contains(serde.getPartition(payload) & mask);

Problem:
The partitioning implementation is intimately tied in with the rest of the 
code. For example, the PartitionAwareSink presently needs to know about both 
the Set of partitions it is associated with and the keys it manages (via the 
mask) in order to accept a tuple. The StreamCodecWrapper also needs to know 
about both of these. The StreamingContainer stores and passes both these 
objects to the sub-classes that use it, e.g. Sinks.

Proposal:
Given that there is a need to implement different partitioning schemes, some of 
which may not be doable with a mask-based approach, e.g. Range partitioning to 
allow for partitioning data not in powers of 2, I propose abstracting away the 
implementation of the actual partitioning scheme from the components that use 
it. I believe this can be done with a fairly simple interface.  I believe we 
can add a class (e.g. PartitionAssigner) that is aware of the internal 
structure of a given Partition implementation but that provide a single method 
to outside classes “boolean accept(T object)”. In the current implementation, 
it would use the mask + partitionKeys approach to compute: 
partitions.contains(serde.getPartition(payload) & mask); In an alternative 
implementation, e.g. Range-based, it could check: for(Range r: partitions) { 
r.contains(serde.getPartition(payload)) }.

This would necessitate updating the Partition class to add something akin to a 
“getPartitionAssigner” and remove the accessors for partitionKeys (which 
includes a mask), and would thus affect existing implementations of the 
Partition class. Consequently, it may make sense to do this as part of the 
upcoming major 4.0 release. The advantage of doing this is that the code 
becomes much easier to parse and understand, easier to maintain because now in 
addition to having pluggable implementations of the Partitioner, we now also 
have pluggable implementations for Partition, and gives us the ability to deal 
with complex partitioning schemes. For example, we could easily create 
partitioning schemes where one partition sees 25% of the data, another sees 
50%, and a third sees 100%. This can be useful for evaluating performance of an 
algorithm under different load within the same application. Or, we have the 
ability to evenly split data across multiple partitions even if we have a 
number of partitions not equal to a power of two.

I would appreciate any feedback, and I’d be happy to whip together a PR 
demonstrating my proposal. My current PR here: 
https://github.com/apache/incubator-apex-core/pull/347/files demonstrates what 
a Range-based Partition implementation  would look like but also serves to show 
how many touch points there are with adding a different underlying 
implementation.




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


RE: A proposal for Malhar

2016-05-27 Thread Ganelin, Ilya
I think the checklist should be provided in the Malhar contribution guide as a 
template and the contributor would be responsible for adding it to the created 
class and including it in the submission.



Sent with Good (www.good.com)

From: Amol Kekre 
Sent: Friday, May 27, 2016 11:37:25 AM
To: dev@apex.apache.org
Cc: d...@apex.incubator.apache.org
Subject: Re: A proposal for Malhar

Checklist is a good idea. At a top level the mere existence of the code in
another dir should signify its maturity. As Thomas pointed out contrib does
something similar and is used by other projects. I am all for checklist as
long as it does not increase the cost of submission. For example -> "would
a new contributor know what to put in the checklist", or maybe the
checklist is posted and we ask the contributor to check off items done; Or
may be the committer who pulls in the code posts the checklist?

Thks,
Amol


On Fri, May 27, 2016 at 8:22 AM, Ganelin, Ilya 
wrote:

> If we're going to adopt partially completed code, I propose that every new
> Malhar operator then contain a checklist as a comment within the class.
>
> This checklist would be defined by the community and would track the
> current development state of the operator. That way there are no unexpected
> surprises.
>
>
>
> Sent with Good (www.good.com<http://www.good.com>)
> 
> From: Amol Kekre 
> Sent: Friday, May 27, 2016 10:13:06 AM
> To: dev@apex.apache.org
> Cc: d...@apex.incubator.apache.org
> Subject: Re: A proposal for Malhar
>
> This is a very good idea and will greatly help contributors. The
> requirements to submit code to this Malhar folder should be very minimal. A
> few that come to my mind
>
> - Should compile
> - License of the external lib (if any) should be Apache compliant license
>  // Need to see if this is part of ASF guidelines
>
> Everything else including naming, idempotency, performance, ... should be
> waived.
>
> Thks
> Amol
>
>
> On Thu, May 26, 2016 at 11:25 PM, Pramod Immaneni 
> wrote:
>
> > As you all know the continued success and growth of an open source
> project
> > is dependent on new members joining the project and contributing. This
> will
> > depend on how accessible the project is for new folks to make meaningful
> > contributions. For a project like ours where the code base has been in
> > development for years, it can be quite daunting for new members to just
> > pick up and make contributions. We need to find ways to make it easier
> for
> > people to do so. Malhar, namely the operator library, is an area where
> > people can contribute without requiring deep knowledge or expertise.
> >
> > We have seen operators take time to mature as evidenced by the road taken
> > by some of our commonly used operators to reach production quality. This
> is
> > due to the fact that apart from the core functionality the operator is
> > trying to implement there are many other aspects to address such as
> > performance, idempotency, processing semantics and scalability. It would
> be
> > difficult even for folks familiar with all these aspects to get
> everything
> > right the first time around and produce comprehensive operators let alone
> > first time contributors. At the same time operators cannot reach this
> > maturity level unless they get used in different scenarios and get a good
> > look at by different people. In maturity I am also including API
> stability.
> >
> > I would like to propose creation of a space inside Malhar, a sub-folder,
> > where contributions can first go in if they are not fully ready and when
> > they mature can be moved out of the sub-folder into an existing module
> or a
> > new module of its own, the package paths can remain the same. The
> > evaluation bar for contributions into this space would be more permissive
> > than it is today, it would require the functionality the operator was
> > developed for be working but will not necessitate that all fault tolerant
> > and scalability aspects be addressed. It will also allow new operators
> that
> > are variations of existing operators till such time as we can determine
> if
> > the new functionality can be subsumed by the original operator or it
> makes
> > sense for the new operator to exist as a separate entity. It will be
> up-to
> > committers and contributors to work together and make the decisions as to
> > whether the individual contributions go into this space or are ready to
> > just go into the regular modules.
> >
> > What does everyone think.
> >
> > Thanks,
> > Pramod
> &

RE: A proposal for Malhar

2016-05-27 Thread Ganelin, Ilya
If we're going to adopt partially completed code, I propose that every new 
Malhar operator then contain a checklist as a comment within the class.

This checklist would be defined by the community and would track the current 
development state of the operator. That way there are no unexpected surprises.



Sent with Good (www.good.com)

From: Amol Kekre 
Sent: Friday, May 27, 2016 10:13:06 AM
To: dev@apex.apache.org
Cc: d...@apex.incubator.apache.org
Subject: Re: A proposal for Malhar

This is a very good idea and will greatly help contributors. The
requirements to submit code to this Malhar folder should be very minimal. A
few that come to my mind

- Should compile
- License of the external lib (if any) should be Apache compliant license
 // Need to see if this is part of ASF guidelines

Everything else including naming, idempotency, performance, ... should be
waived.

Thks
Amol


On Thu, May 26, 2016 at 11:25 PM, Pramod Immaneni 
wrote:

> As you all know the continued success and growth of an open source project
> is dependent on new members joining the project and contributing. This will
> depend on how accessible the project is for new folks to make meaningful
> contributions. For a project like ours where the code base has been in
> development for years, it can be quite daunting for new members to just
> pick up and make contributions. We need to find ways to make it easier for
> people to do so. Malhar, namely the operator library, is an area where
> people can contribute without requiring deep knowledge or expertise.
>
> We have seen operators take time to mature as evidenced by the road taken
> by some of our commonly used operators to reach production quality. This is
> due to the fact that apart from the core functionality the operator is
> trying to implement there are many other aspects to address such as
> performance, idempotency, processing semantics and scalability. It would be
> difficult even for folks familiar with all these aspects to get everything
> right the first time around and produce comprehensive operators let alone
> first time contributors. At the same time operators cannot reach this
> maturity level unless they get used in different scenarios and get a good
> look at by different people. In maturity I am also including API stability.
>
> I would like to propose creation of a space inside Malhar, a sub-folder,
> where contributions can first go in if they are not fully ready and when
> they mature can be moved out of the sub-folder into an existing module or a
> new module of its own, the package paths can remain the same. The
> evaluation bar for contributions into this space would be more permissive
> than it is today, it would require the functionality the operator was
> developed for be working but will not necessitate that all fault tolerant
> and scalability aspects be addressed. It will also allow new operators that
> are variations of existing operators till such time as we can determine if
> the new functionality can be subsumed by the original operator or it makes
> sense for the new operator to exist as a separate entity. It will be up-to
> committers and contributors to work together and make the decisions as to
> whether the individual contributions go into this space or are ready to
> just go into the regular modules.
>
> What does everyone think.
>
> Thanks,
> Pramod
>


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: IntelliJ and Netbeans code styles are missing

2016-05-26 Thread Ganelin, Ilya
Another question - are CheckStyle settings packaged anywhere? There is 
obviously a CheckStyle run by Jenkins - is this CheckStyle published for Apex 
or Malhar?




On 5/24/16, 2:19 PM, "Ganelin, Ilya"  wrote:

>I just realized that these were moved to subdirectories but the associated 
>documentation was never updated.
>I’ll fix that.
>
>
>
>
>On 5/24/16, 2:16 PM, "Ganelin, Ilya"  wrote:
>
>>Hi all – I’ve been setting up a new dev environment and noticed that the 
>>apex-style.jar and apex-style.zip are missing from:
>>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/intellij
>>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/netbeans
>>
>>Did these get lost in the migration from Apache or did we decide to approach 
>>this a different way?
>>
>>In recent work I did in the Alluxio project, I created a utility that would 
>>package/unpack code-style settings allowing changes to be tracked in GitHub 
>>but still provide a relatively automated process for setting up 
>>configurations. Would something like that make sense for Apex?
>>
>>https://github.com/Alluxio/alluxio/pull/3168
>>
>>
>>
>>The information contained in this e-mail is confidential and/or proprietary 
>>to Capital One and/or its affiliates and may only be used solely in 
>>performance of work or services for Capital One. The information transmitted 
>>herewith is intended only for use by the individual or entity to which it is 
>>addressed. If the reader of this message is not the intended recipient, you 
>>are hereby notified that any review, retransmission, dissemination, 
>>distribution, copying or other use of, or taking of any action in reliance 
>>upon this information is strictly prohibited. If you have received this 
>>communication in error, please contact the sender and delete the material 
>>from your computer.
>
>
>The information contained in this e-mail is confidential and/or proprietary to 
>Capital One and/or its affiliates and may only be used solely in performance 
>of work or services for Capital One. The information transmitted herewith is 
>intended only for use by the individual or entity to which it is addressed. If 
>the reader of this message is not the intended recipient, you are hereby 
>notified that any review, retransmission, dissemination, distribution, copying 
>or other use of, or taking of any action in reliance upon this information is 
>strictly prohibited. If you have received this communication in error, please 
>contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Get ApplicationId from inside Application

2016-05-25 Thread Ganelin, Ilya
Hi guys - when I try to retrieve the application id with:
("{ApplicationId=" + dag.getValue(Context.DAGContext.APPLICATION_ID) + "}\n”) 
the applicationId is reported as null.



I’m trying to run this inside of the “populateDAG” function. 

Am I doing something wrong?

On 5/19/16, 11:20 AM, "Gaurav Gupta"  wrote:

>In an operator you can get it using
>
>String appid = Context.OperatorContext.getValue(Context.DAGContext.
>APPLICATION_ID);
>
>On Thu, May 19, 2016 at 11:04 AM, David Yan  wrote:
>
>> Specifically, you can try:
>>
>>   String appid = dag.getValue(DAGContext.APPLICATION_ID);
>>
>> David
>>
>> On Thu, May 19, 2016 at 10:12 AM, Sandesh Hegde 
>> wrote:
>>
>> > Yes, use the conf object passed to StreamingApplication.
>> >
>> > On Thu, May 19, 2016, 10:09 AM Ganelin, Ilya <
>> ilya.gane...@capitalone.com>
>> > wrote:
>> >
>> > > Hi all – is it possible to retrieve the Apex application ID:
>> > > e.g.application_1463594017097_0024 within the Apex program? For example
>> > > from the DAG object or some other object?
>> > > 
>> > >
>> > > The information contained in this e-mail is confidential and/or
>> > > proprietary to Capital One and/or its affiliates and may only be used
>> > > solely in performance of work or services for Capital One. The
>> > information
>> > > transmitted herewith is intended only for use by the individual or
>> entity
>> > > to which it is addressed. If the reader of this message is not the
>> > intended
>> > > recipient, you are hereby notified that any review, retransmission,
>> > > dissemination, distribution, copying or other use of, or taking of any
>> > > action in reliance upon this information is strictly prohibited. If you
>> > > have received this communication in error, please contact the sender
>> and
>> > > delete the material from your computer.
>> > >
>> >
>>


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: IntelliJ and Netbeans code styles are missing

2016-05-24 Thread Ganelin, Ilya
I just realized that these were moved to subdirectories but the associated 
documentation was never updated.
I’ll fix that.




On 5/24/16, 2:16 PM, "Ganelin, Ilya"  wrote:

>Hi all – I’ve been setting up a new dev environment and noticed that the 
>apex-style.jar and apex-style.zip are missing from:
>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/intellij
>https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/netbeans
>
>Did these get lost in the migration from Apache or did we decide to approach 
>this a different way?
>
>In recent work I did in the Alluxio project, I created a utility that would 
>package/unpack code-style settings allowing changes to be tracked in GitHub 
>but still provide a relatively automated process for setting up 
>configurations. Would something like that make sense for Apex?
>
>https://github.com/Alluxio/alluxio/pull/3168
>
>
>
>The information contained in this e-mail is confidential and/or proprietary to 
>Capital One and/or its affiliates and may only be used solely in performance 
>of work or services for Capital One. The information transmitted herewith is 
>intended only for use by the individual or entity to which it is addressed. If 
>the reader of this message is not the intended recipient, you are hereby 
>notified that any review, retransmission, dissemination, distribution, copying 
>or other use of, or taking of any action in reliance upon this information is 
>strictly prohibited. If you have received this communication in error, please 
>contact the sender and delete the material from your computer.


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


IntelliJ and Netbeans code styles are missing

2016-05-24 Thread Ganelin, Ilya
Hi all – I’ve been setting up a new dev environment and noticed that the 
apex-style.jar and apex-style.zip are missing from:
https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/intellij
https://github.com/apache/incubator-apex-core/tree/master/misc/ide-templates/netbeans

Did these get lost in the migration from Apache or did we decide to approach 
this a different way?

In recent work I did in the Alluxio project, I created a utility that would 
package/unpack code-style settings allowing changes to be tracked in GitHub but 
still provide a relatively automated process for setting up configurations. 
Would something like that make sense for Apex?

https://github.com/Alluxio/alluxio/pull/3168



The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


Re: Apache Beam Integration

2016-05-23 Thread Ganelin, Ilya
Siouan - this is great guidance. I’m going to start with a deep review of the 
Beam API and the existing Apex API to identify overlap and will iterate from 
there. 
JIRA here: https://issues.apache.org/jira/browse/APEXMALHAR-2099




On 5/20/16, 1:06 PM, "Siyuan Hua"  wrote:

>Hi Ilya,
>
>I totally agreed we could work on these things in parallel.  I think the
>next missing part is the common interface that could be used both in
>Windowed Operators and High-level API.
>Think from a stream application developers' perspective, the features we
>are missing(Watermark, trigger, window) are just some configuration options
>in Windowed Operator and some declarative API as part of the high-level
>API. We can come up with some common interfaces to both of them.
>
>I'm thinking about something like this
>
>class Window {
>  long begin;
>  long delta;
>}
>
>class WindowOption {
>  ...
>  static WindowOption InEvery(int delta, Unit unit);
>
>  static WindowOption InEvery(int delta, Unit unit, int slideDelta, Unit
>slideunit);
>
>}
>
>So this can be used in WindowedOperator:
>
>abstract class WindowedOperator {
>
>  private WindowOption[] options;
>
>  .
>
>}
>
>And this can also be use in high level API
>
>stream.apply(sum("val1"), By("key1"), InEvery(1, TimeUnit.Hour, 5,
>TimeUnit.Minute)...)
>
>When we try to design this common interface, we also need to spend time to
>do research on Beam to make sure the bean dag can be easily translated to
>Apex words by using these interfaces. I actually don't like the way how
>flink did to translate Beam to their dag, because their existing API is not
>so pefect match to Beam (They actually did some rework/workaround).
>Though I like and agree with the Beam's programming model, I don't like the
>API either because they are not that declarative(too many terms in the API
>names) and extensible(for example we also allow people to add top level
>transformation method to Stream API in the first cut). I think if we make
>high-level API more in SQL flavour is better for broader range of
>audience.
>
>For now, I'm gonna work on this kind of interface, and Ilya it would be
>appreciated if you can give me input and also some reviews from a beam
>translation perspective.
>
>And also anyone who wants to work on the Operator side can are more than
>welcome to give feedback/contribute to the common interface.
>
>
>Thanks,
>Siyuan
>
>
>
>
>
>
>On Fri, May 20, 2016 at 11:54 AM, Ganelin, Ilya > wrote:
>
>>
>> Hi, all - I would like to dive into the Beam development effort in earnest
>> and drive this. There are already a number of complementary components
>> being worked on in parallel:
>>
>> 1) Windowed operators:
>> https://issues.apache.org/jira/browse/APEXMALHAR-2085
>> 2) A higher-level API
>> https://issues.apache.org/jira/browse/APEXMALHAR-1939
>> 3) What Siyuan highlighted as “next-phase” for the Streaming API (e.g.
>> Watermarks, triggers, and different windowing semantics)
>>
>> Given the wealth of activity, I would love some help understanding how I
>> could best focus my energy to get us integrated with Beam as quickly as
>> possible. If we still feel that there’s design work that needs to happen,
>> either on operator design or on the API, before we can move forward, I’d be
>> happy to help flesh that out. Alternately, if we could begin implementation
>> of things like an API for windowed operators or watermarks, I could begin
>> work on that.
>>
>> I think the higher-level API can definitely support the Beam work, and we
>> can likely base our development on that API. However, it also seems that
>> the other requirements for Beam, specifically watermarks and triggers,
>> could be implemented independent of that effort. The API effort is
>> complementary but ultimately targets community growth and ease of use for
>> Apex, rather than Beam support.
>>
>> Lastly, to Siyuan’s point below, I think it’s perfectly reasonable to
>> initially target using the Beam API to create an Apex DAG, rather than
>> worrying about how to convert generic Apex applications into the Beam
>> language.
>>
>> I look forward to hearing your thoughts!
>>
>>
>>
>>
>> On 5/12/16, 9:31 AM, "Thomas Weise"  wrote:
>>
>> >Created proxy JIRA:
>> >
>> >https://issues.apache.org/jira/browse/APEXMALHAR-2089
>> >
>> >
>> >On Wed, May 11, 2016 at 1:31 PM, Thomas Weise 
>> >wrote:
>> >
>> >> SQL -> Beam is a longer ter

Re: [VOTE] Apache Apex Malhar Release 3.4.0 (RC1)

2016-05-23 Thread Ganelin, Ilya
+1 (Binding)

Comments: Apache release tests failed on my machine with a multitude of errors 
related to @param notation.

java version “1.8.0_77"
OS X 10.11.4
PASSED:

* Verified md5 and SHA512 for apache-apex-malhar-3.4.0-source-release.tar.gz

* Verified md5 and SHA512 for apache-apex-malhar-3.4.0-source-release.zip 

* Verified presence of README.MD, NOTICE, LICENSE, and CHANGELOG.md. DISCLAIMER 
is not necessary due to project not being in incubator. 

* No included binaries

* Verified Apex compilation

* Verified Malhar compilation

FAILED:
* Executed Apache release tests

* Apache release tests failed with numerous errors.  
 






On 5/20/16, 8:49 AM, "Thomas Weise"  wrote:

>Dear Community,
>
>Please vote on the following Apache Apex Malhar 3.4.0 release candidate.
>
>This release follows core 3.4.0, resolves 66 JIRA tickets and adds
>a number of exciting new features and enhancements, including:
>
>- First cut of the high level Java stream API
>- Large operator state management (embedded key/value storage)
>- Connectors for Apache NiFi
>- Connectors and checkpointing with Apache Geode
>- New operators for transform, projection, enrichment
>- Support for Avro and Parquet formats
>
>List of all issues fixed: https://s.apache.org/Gc1d
>
>This is a source release with binary artifacts published to Maven.
>
>Staging directory (new dist directory access still not sorted out):
>https://dist.apache.org/repos/dist/dev/incubator/apex/apache-apex-malhar-3.4.0-RC1/
>Source zip:
>https://dist.apache.org/repos/dist/dev/incubator/apex/apache-apex-malhar-3.4.0-RC1/apache-apex-malhar-3.4.0-source-release.zip
>Source tar.gz:
>https://dist.apache.org/repos/dist/dev/incubator/apex/apache-apex-malhar-3.4.0-RC1/apache-apex-malhar-3.4.0-source-release.tar.gz
>Maven staging repository:
>https://repository.apache.org/content/repositories/orgapacheapex-1015/
>
>Git source:
>https://git-wip-us.apache.org/repos/asf?p=incubator-apex-malhar.git;a=commit;h=refs/tags/v3.4.0-RC1
>(commit: d2596b00a61d5b760ce5e95782431115b992285c)
>
>PGP key:
>http://pgp.mit.edu:11371/pks/lookup?op=vindex&search=t...@apache.org
>KEYS file:
>https://dist.apache.org/repos/dist/release/incubator/apex/KEYS
>
>More information at:
>http://apex.apache.org
>
>Please try the release and vote; vote will be open for at least 72 hours.
>
>[ ] +1 approve (and what verification was done)
>[ ] -1 disapprove (and reason why)
>
>http://www.apache.org/foundation/voting.html
>
>How to verify release candidate:
>
>http://apex.apache.org/verification.html
>
>Thanks,
>Thomas


The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.