date:20150825

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37886531
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

That is true, but it might ease the problem a little bit if newly added 
tests try to use `collect`. And I doubt that we'll soon find somebody who will 
take care of this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2460] [runtime] Check parent state in i...

2015-08-25 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1051#issuecomment-134676740
  
Addressing the comment and merging this for 0.10 and 0.9.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand

2015-08-25 Thread aljoscha

GitHub user aljoscha opened a pull request:

https://github.com/apache/flink/pull/1057

[hotfix] Allow setting FLINK_CONF_DIR by hand

This makes it possible for users to set a per-job conf directory when using 
the one-flink-cluster-per-job mode on yarn. Which enables, for example, per job 
log settings.

@uce  This should probably also go into 0.9.1.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/aljoscha/flink flink-conf-dir

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1057.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1057


commit d42e7df0110adfa4702de2fc2e31c85e8ecc0c18
Author: Aljoscha Krettek aljoscha.kret...@gmail.com
Date:   2015-08-25T17:26:29Z

[hotfix] Allow setting FLINK_CONF_DIR by hand




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/1057#discussion_r37895597
  
--- Diff: flink-dist/src/main/flink-bin/bin/config.sh ---
@@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib
 # The above lib path is used by the shell script to retrieve jars in a 
 # directory, so it needs to be unmangled.
 FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR`
-FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf
+if [ -z $FLINK_CONF_DIR ]; then 
FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi
--- End diff --

Yes, you don't need the else here because the variable is set either 
through the environment or in the if block. Still, I'd prefer newlines but it 
is maybe just a matter of taste here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...

2015-08-25 Thread fhueske

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/1056

[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

Right now, Flink's wrappers for Hadoop OutputFormats always use a 
`FileOutputCommitter`.

- In the `mapreduce` API, Hadoop OutputFormats have a method 
`getOutputCommitter()` which can be overwritten and returns the 
`FileOutputFormat` by default.
- In the `mapred`API, the `OutputCommitter` should be obtained from the 
`JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned.

This PR uses the respective methods to obtain the correct 
`OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, 
the original semantics are preserved if no custom committer is implemented or 
set by the user.
I also added convenience methods to the constructors of the `mapred` 
wrappers to set the `OutputCommitter` in the `JobConf`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink hadoopOutCommitter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1056


commit a632203a948f2e7973339a0eab88750f7ce70cc5
Author: Fabian Hueske fhue...@apache.org
Date:   2015-07-30T19:47:01Z

[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37880533
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

If I'm not mistaken, then we wanted to avoid writing data to disk because 
this sometimes fails on Travis. Instead we should use `collect` to keep the 
data in memory.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-134647030
  
+1 to merge, we should follow up on the Mini cluster and Curator shading 
separately


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-1011) Sometimes Flow/Stack Layout is not presented in Dashboard's history


 [ 
https://issues.apache.org/jira/browse/FLINK-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1011.

Resolution: Cannot Reproduce

 Sometimes Flow/Stack Layout is not presented in Dashboard's history
 ---

 Key: FLINK-1011
 URL: https://issues.apache.org/jira/browse/FLINK-1011
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: pre-apache-0.5
 Environment: Mac OS X and Ubuntu linux. OpenJDK 1.7.
Reporter: Asterios Katsifodimos
Priority: Minor

 The flow/stack layout in the history of completed jobs does not show up 
 (Stratosphere Dashboard).  This does not happen always. Sometimes you may get 
 it to work. 
 I just reproduced this one with the WordCount java example from 0.5.1 
 version. The job runs successfully. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1158) Logging property files missing in project created by archetypes

2015-08-25 Thread Robert Metzger (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711610#comment-14711610
 ] 

Robert Metzger commented on FLINK-1158:
---

Looks like I fixed this with 
https://github.com/apache/flink/commit/354efec0f9da0fa03ea9b337b02a1a2a03a9ac16

 Logging property files missing in project created by archetypes
 ---

 Key: FLINK-1158
 URL: https://issues.apache.org/jira/browse/FLINK-1158
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.7.0-incubating
Reporter: Till Rohrmann
 Fix For: 0.9


 If one creates a flink project using the archetypes, then there are no 
 predefined logging properties files. Would be very convenient for the user to 
 have them generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1158) Logging property files missing in project created by archetypes

2015-08-25 Thread Robert Metzger (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-1158:
-

Assignee: Robert Metzger

 Logging property files missing in project created by archetypes
 ---

 Key: FLINK-1158
 URL: https://issues.apache.org/jira/browse/FLINK-1158
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.7.0-incubating
Reporter: Till Rohrmann
Assignee: Robert Metzger
 Fix For: 0.9


 If one creates a flink project using the archetypes, then there are no 
 predefined logging properties files. Would be very convenient for the user to 
 have them generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2474) Occasional failures in PartitionedStateCheckpointingITCase


 [ 
https://issues.apache.org/jira/browse/FLINK-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2474:
---
Labels: test-stability  (was: )

 Occasional failures in PartitionedStateCheckpointingITCase
 --

 Key: FLINK-2474
 URL: https://issues.apache.org/jira/browse/FLINK-2474
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Márton Balassi
  Labels: test-stability

 The error message
 {code}
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 47.301 sec 
  FAILURE! - in 
 org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase
 runCheckpointedProgram(org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase)
   Time elapsed: 42.495 sec   FAILURE!
 java.lang.AssertionError: expected:86678900 but was:3467156
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.failNotEquals(Assert.java:743)
   at org.junit.Assert.assertEquals(Assert.java:118)
   at org.junit.Assert.assertEquals(Assert.java:144)
   at 
 org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase.runCheckpointedProgram(PartitionedStateCheckpointingITCase.java:117)
 {code}
 The detailed CI logs
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/73928480/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes


[ 
https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711456#comment-14711456
 ] 

ASF GitHub Bot commented on FLINK-2569:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37880533
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

If I'm not mistaken, then we wanted to avoid writing data to disk because 
this sometimes fails on Travis. Instead we should use `collect` to keep the 
data in memory.


 CsvReader support for ValueTypes
 

 Key: FLINK-2569
 URL: https://issues.apache.org/jira/browse/FLINK-2569
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Greg Hogan
Assignee: Chiwan Park
Priority: Minor

 From the Flink Programming Guide section on Data Sources:
 {quote}
 readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) 
 delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic 
 java types and their Value counterparts as field types.
 {quote}
 When specifying a ValueType, i.e.
 {code}
 CsvReader csvReader = env.readCsvFile(filename);
 csvReader.types(IntValue.class, IntValue.class);
 {code}
 the following error occurs as BasicTypeInfo is specifically requested in 
 CsvReader.types(...).
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
   at org.apache.flink.client.program.Client.run(Client.java:327)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977)
 Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a 
 basic type.
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177)
   at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393)
   at Driver.main(Driver.java:105)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:497)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling


[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711466#comment-14711466
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134634885
  
LGTM +1


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2089] [runtime] Fix illegal state in Re...

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134634885
  
LGTM +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37881537
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.util;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.core.io.InputSplit;
+import org.apache.flink.runtime.operators.testutils.MockEnvironment;
+import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider;
+import org.apache.flink.runtime.state.LocalStateHandle;
+import org.apache.flink.streaming.api.functions.source.FileSourceFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+import org.apache.flink.types.IntValue;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+public class FileSourceFunctionTest {
+   @Test
+   public void testFileSourceFunction() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.run(ctx);
+   } catch (Exception e) {
+   e.printStackTrace();
+   }
+   Assert.assertTrue(ctx.getData().size() == 200);
+   }
+
+   @Test
+   public void testFileSourceFunctionCheckpoint() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.restoreState(100:1);
+

[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers


[ 
https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711510#comment-14711510
 ] 

ASF GitHub Bot commented on FLINK-2291:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-134646755
  
Looks very good. Minor comments that we may address after this pull request:

  - The Flink Mini cluster becomes tricky, the configurations ever more 
intransparent. This could use a rework.
  - You shade curator in Hadoop, but not in Flink. Do we expect collisions 
with other systems that use Curator, like newer versions of the Kafka 
consumers? (IIRC 0.8.3 starts using Curator).


 Use ZooKeeper to elect JobManager leader and send information to TaskManagers
 -

 Key: FLINK-2291
 URL: https://issues.apache.org/jira/browse/FLINK-2291
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.10
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 0.10


 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will 
 act as the responsible {{JobManager}} for all {{TaskManager}}. The 
 {{TaskManager}} will get the address of the leader from ZooKeeper.
 Related Wiki: 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-134646755
  
Looks very good. Minor comments that we may address after this pull request:

  - The Flink Mini cluster becomes tricky, the configurations ever more 
intransparent. This could use a rework.
  - You shade curator in Hadoop, but not in Flink. Do we expect collisions 
with other systems that use Curator, like newer versions of the Kafka 
consumers? (IIRC 0.8.3 starts using Curator).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-557) debian: permissions and users


 [ 
https://issues.apache.org/jira/browse/FLINK-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-557.
---
Resolution: Won't Fix

We don't provide a Debian image as download anymore.

 debian: permissions and users
 -

 Key: FLINK-557
 URL: https://issues.apache.org/jira/browse/FLINK-557
 Project: Flink
  Issue Type: Bug
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 currently it seems as if all processes are run by the root user.
 For example calling the following command from a normal user account leads to 
 write problems for the logfiles:
 ```
 /usr/share/stratosphere-dist/bin/stratosphere run -j  
 /usr/share/stratosphere-dist/examples/stratosphere-java-examples-0.5-SNAPSHOT-WordCount.jar
  -a 16 file:///var/log/syslog file:///home/physikerwelt/out
 
 An example how to run services as the designated user can be found at
 https://github.com/physikerwelt/mathoid/tree/master/debian
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/557
 Created by: [physikerwelt|https://github.com/physikerwelt]
 Labels: 
 Created at: Tue Mar 11 12:41:12 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes


[ 
https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711570#comment-14711570
 ] 

ASF GitHub Bot commented on FLINK-2569:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37886531
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

That is true, but it might ease the problem a little bit if newly added 
tests try to use `collect`. And I doubt that we'll soon find somebody who will 
take care of this.


 CsvReader support for ValueTypes
 

 Key: FLINK-2569
 URL: https://issues.apache.org/jira/browse/FLINK-2569
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Greg Hogan
Assignee: Chiwan Park
Priority: Minor

 From the Flink Programming Guide section on Data Sources:
 {quote}
 readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) 
 delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic 
 java types and their Value counterparts as field types.
 {quote}
 When specifying a ValueType, i.e.
 {code}
 CsvReader csvReader = env.readCsvFile(filename);
 csvReader.types(IntValue.class, IntValue.class);
 {code}
 the following error occurs as BasicTypeInfo is specifically requested in 
 CsvReader.types(...).
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
   at org.apache.flink.client.program.Client.run(Client.java:327)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977)
 Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a 
 basic type.
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177)
   at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393)
   at Driver.main(Driver.java:105)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:497)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2565) Support primitive arrays as keys


[ 
https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711586#comment-14711586
 ] 

ASF GitHub Bot commented on FLINK-2565:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134663435
  
@fhueske Added the test you requested.


 Support primitive arrays as keys
 

 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2565) Support primitive arrays as keys


[ 
https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711594#comment-14711594
 ] 

ASF GitHub Bot commented on FLINK-2565:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134665239
  
Thanks! Good to merge, IMO.


 Support primitive arrays as keys
 

 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository


[ 
https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711607#comment-14711607
 ] 

ASF GitHub Bot commented on FLINK-2200:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134667459
  
There is a Spark 2.11 artifact in mvn central.
I think they are doing a similar thing as we are already doing with the 
hadoop1/hadoop2 versions: They generate specific pom files when deploying Spark 
to maven central: 
https://github.com/apache/spark/blob/master/dev/change-scala-version.sh


 Flink API with Scala 2.11 - Maven Repository
 

 Key: FLINK-2200
 URL: https://issues.apache.org/jira/browse/FLINK-2200
 Project: Flink
  Issue Type: Wish
  Components: Build System, Scala API
Reporter: Philipp Götze
Assignee: Chiwan Park
Priority: Trivial
  Labels: maven

 It would be nice if you could upload a pre-built version of the Flink API 
 with Scala 2.11 to the maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2189) NullPointerException in MutableHashTable


[ 
https://issues.apache.org/jira/browse/FLINK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711608#comment-14711608
 ] 

Ufuk Celebi commented on FLINK-2189:


[~Felix Neutatz], can you still reproduce this error after the recent fix in 
627f3cbcfdca8368eea6aa825cd9a45a9a0a841f?

 NullPointerException in MutableHashTable
 

 Key: FLINK-2189
 URL: https://issues.apache.org/jira/browse/FLINK-2189
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Till Rohrmann

 [~Felix Neutatz] reported a {{NullPointerException}} in the 
 {{MutableHashTable}} when running the {{ALS}} algorithm. The stack trace is 
 the following:
 {code}
 Caused by: java.lang.NullPointerException
   at 
 org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1094)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:927)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:783)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544)
   at 
 org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104)
   at 
 org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 He produced this error on his local machine with the following code:
 {code}
 implicit val env = ExecutionEnvironment.getExecutionEnvironment
 val links = MovieLensUtils.readLinks(movieLensDir + links.csv)
 val movies = MovieLensUtils.readMovies(movieLensDir + movies.csv)
 val ratings = MovieLensUtils.readRatings(movieLensDir + ratings.csv)
 val tags = MovieLensUtils.readTags(movieLensDir + tags.csv)
   
 val ratingMatrix =  ratings.map { r = (r.userId.toInt, r.movieId.toInt, 
 r.rating) }
 val testMatrix =  ratings.map { r = (r.userId.toInt, r.movieId.toInt) }
 val als = ALS()
.setIterations(10)
.setNumFactors(10)
.setBlocks(150) 
  
 als.fit(ratingMatrix)
 val result = als.predict(testMatrix)
  
 result.print
 val risk = als.empiricalRisk(ratingMatrix).collect().apply(0)
 println(Empirical risk:  + risk) 
 env.execute()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37883013
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -120,12 +131,24 @@ public void run(SourceContextOUT ctx) throws 
Exception {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
if (nextElement == null  splitIterator.hasNext()) {
-   format.open(splitIterator.next());
+   InputSplit split = splitIterator.next();
+   splitNumber = split.getSplitNumber();
+   currRecord = 0l;
+   format.open(split);
continue;
} else if (nextElement == null) {
break;
}
-   ctx.collect(nextElement);
+   if(splitNumber == checkpointedSplit){
--- End diff --

What if you've checkpointed the 2. split after seeing the 1. and 2. split 
and now the source is re-executed with the first split? Aren't records written 
again because you only save the latest checkpointed split number?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant


 [ 
https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2429:
---
Fix Version/s: 0.10

 Remove the enableCheckpointing() without interval variant
 ---

 Key: FLINK-2429
 URL: https://issues.apache.org/jira/browse/FLINK-2429
 Project: Flink
  Issue Type: Wish
  Components: Streaming
Reporter: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 I think it is not very obvious what the default checkpointing interval is.
 Also, when somebody activates checkpointing, shouldn't they think about what 
 they want in terms of frequency and recovery latency tradeoffs?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134624663
  
Cool, that was quick ;)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711509#comment-14711509
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37883013
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java
 ---
@@ -120,12 +131,24 @@ public void run(SourceContextOUT ctx) throws 
Exception {
OUT nextElement = serializer.createInstance();
nextElement =  format.nextRecord(nextElement);
if (nextElement == null  splitIterator.hasNext()) {
-   format.open(splitIterator.next());
+   InputSplit split = splitIterator.next();
+   splitNumber = split.getSplitNumber();
+   currRecord = 0l;
+   format.open(split);
continue;
} else if (nextElement == null) {
break;
}
-   ctx.collect(nextElement);
+   if(splitNumber == checkpointedSplit){
--- End diff --

What if you've checkpointed the 2. split after seeing the 1. and 2. split 
and now the source is re-executed with the first split? Aren't records written 
again because you only save the latest checkpointed split number?


 Make Streaming File Sources Persistent
 --

 Key: FLINK-2314
 URL: https://issues.apache.org/jira/browse/FLINK-2314
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Sheetal Parade
  Labels: easyfix, starter

 Streaming File sources should participate in the checkpointing. They should 
 track the bytes they read from the file and checkpoint it.
 One can look at the sequence generating source function for an example of a 
 checkpointed source.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository


[ 
https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711563#comment-14711563
 ] 

ASF GitHub Bot commented on FLINK-2200:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134658771
  
Is the Maven shade plugin bug the reason why this fails:

```
ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade (shade-hadoop) on 
project flink-yarn-tests_2.11: Error creating shaded jar: 3 problems were 
encountered while building the effective model for 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version]
[ERROR] [WARNING] 'artifactId' contains an expression but should be a 
constant. @ org.apache.flink:flink-yarn-tests${scala.suffix}:[unknown-version], 
/home/robert/incubator-flink/flink-yarn-tests/pom.xml, line 36, column 14
[ERROR] [WARNING] 'parent.relativePath' of POM 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version] 
(/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml)
 points at org.apache.flink:flink-yarn-tests${scala.suffix} instead of 
org.apache.flink:flink-parent${scala.suffix}, please verify your project 
structure @ line 3, column 11
[ERROR] [FATAL] Non-resolvable parent POM for 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version]: Could not find 
artifact org.apache.flink:flink-parent${scala.suffix}:pom:0.10-SNAPSHOT in 
apache.snapshots (http://repository.apache.org/snapshots) and 
'parent.relativePath' points at wrong local POM @ line 3, column 11
[ERROR] for project 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version] at 
/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml
```
?

 About the shading artifacts, your guess is right. Because Hadoop packages 
don't need Scala dependencies, I didn't add suffix to them. But if we need the 
suffix for them to maintain uniformity, we can add the suffix. How do you think?

I think its fine to leave them as they are.

 As you see, there are property expressions (${scala.suffix}) in 
artifactId. I think that it can be a problem. How can I solve this?

Yes, that is certainly a problem.
Also, the artifact for flink-parent is not created properly in my local 
maven repository. Its name is now `flink-parent${scala.suffix}/`.

Maybe we have to look at other projects which are doing the same... if 
there are any projects ;)
Kafka for example is offering builds for different scala versions. Sadly, 
they are using sbt for building their project.
Spark doesn't deploy its _2.11 artifacts to maven central.


 Flink API with Scala 2.11 - Maven Repository
 

 Key: FLINK-2200
 URL: https://issues.apache.org/jira/browse/FLINK-2200
 Project: Flink
  Issue Type: Wish
  Components: Build System, Scala API
Reporter: Philipp Götze
Assignee: Chiwan Park
Priority: Trivial
  Labels: maven

 It would be nice if you could upload a pre-built version of the Flink API 
 with Scala 2.11 to the maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134658771
  
Is the Maven shade plugin bug the reason why this fails:

```
ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade (shade-hadoop) on 
project flink-yarn-tests_2.11: Error creating shaded jar: 3 problems were 
encountered while building the effective model for 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version]
[ERROR] [WARNING] 'artifactId' contains an expression but should be a 
constant. @ org.apache.flink:flink-yarn-tests${scala.suffix}:[unknown-version], 
/home/robert/incubator-flink/flink-yarn-tests/pom.xml, line 36, column 14
[ERROR] [WARNING] 'parent.relativePath' of POM 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version] 
(/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml)
 points at org.apache.flink:flink-yarn-tests${scala.suffix} instead of 
org.apache.flink:flink-parent${scala.suffix}, please verify your project 
structure @ line 3, column 11
[ERROR] [FATAL] Non-resolvable parent POM for 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version]: Could not find 
artifact org.apache.flink:flink-parent${scala.suffix}:pom:0.10-SNAPSHOT in 
apache.snapshots (http://repository.apache.org/snapshots) and 
'parent.relativePath' points at wrong local POM @ line 3, column 11
[ERROR] for project 
org.apache.flink:flink-yarn-tests_2.11:[unknown-version] at 
/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml
```
?

 About the shading artifacts, your guess is right. Because Hadoop packages 
don't need Scala dependencies, I didn't add suffix to them. But if we need the 
suffix for them to maintain uniformity, we can add the suffix. How do you think?

I think its fine to leave them as they are.

 As you see, there are property expressions (${scala.suffix}) in 
artifactId. I think that it can be a problem. How can I solve this?

Yes, that is certainly a problem.
Also, the artifact for flink-parent is not created properly in my local 
maven repository. Its name is now `flink-parent${scala.suffix}/`.

Maybe we have to look at other projects which are doing the same... if 
there are any projects ;)
Kafka for example is offering builds for different scala versions. Sadly, 
they are using sbt for building their project.
Spark doesn't deploy its _2.11 artifacts to maven central.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2089] [runtime] Fix illegal state in Re...

2015-08-25 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134676836
  
I will address the comment and merge this for 0.10 and 0.9.1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling


[ 
https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711654#comment-14711654
 ] 

ASF GitHub Bot commented on FLINK-2089:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1050#issuecomment-134676836
  
I will address the comment and merge this for 0.10 and 0.9.1.


 Buffer recycled IllegalStateException during cancelling
 -

 Key: FLINK-2089
 URL: https://issues.apache.org/jira/browse/FLINK-2089
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.10, 0.9.1


 [~rmetzger] reported the following stack trace during cancelling of high 
 parallelism jobs:
 {code}
 Error: java.lang.IllegalStateException: Buffer has already been recycled.
 at 
 org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142)
 at 
 org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78)
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72)
 at 
 org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80)
 at 
 org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
 at 
 org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
 at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
 at java.lang.Thread.run(Thread.java:745)
 {code}
 This looks like a concurrent buffer pool release/buffer usage error. I'm 
 investing this today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2460) ReduceOnNeighborsWithExceptionITCase failure


[ 
https://issues.apache.org/jira/browse/FLINK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711653#comment-14711653
 ] 

ASF GitHub Bot commented on FLINK-2460:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1051#issuecomment-134676740
  
Addressing the comment and merging this for 0.10 and 0.9.1.


 ReduceOnNeighborsWithExceptionITCase failure
 

 Key: FLINK-2460
 URL: https://issues.apache.org/jira/browse/FLINK-2460
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Ufuk Celebi

 I noticed a build error due to failure on this case. It was on a branch of my 
 fork, which didn't actually have anything to do with the failed test or the 
 runtime system at all.
 Here's the error log: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/73695554/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...

2015-08-25 Thread chiwanpark

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/1053#issuecomment-134584979
  
@fhueske Thanks for review. :) I addressed your comments.

* Add `getBasicAndBasicValueTupleTypeInfo` method into `TupleTypeInfo`
* Add `isBasicValueType` method into `ValueTypeInfo` class to check whether 
the type is basic value or not


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134639235
  
Looks good

+1 to merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/1057#discussion_r37894302
  
--- Diff: flink-dist/src/main/flink-bin/bin/config.sh ---
@@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib
 # The above lib path is used by the shell script to retrieve jars in a 
 # directory, so it needs to be unmangled.
 FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR`
-FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf
+if [ -z $FLINK_CONF_DIR ]; then 
FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi
--- End diff --

Maybe just code style but could you make this more explicit using if-else 
blocks?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1057#issuecomment-134679865
  
Very useful feature. In addition, I could also imagine that the config file 
could be passed as a parameter to the ./bin/flink utility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand

2015-08-25 Thread aljoscha

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/1057#discussion_r37894520
  
--- Diff: flink-dist/src/main/flink-bin/bin/config.sh ---
@@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib
 # The above lib path is used by the shell script to retrieve jars in a 
 # directory, so it needs to be unmangled.
 FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR`
-FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf
+if [ -z $FLINK_CONF_DIR ]; then 
FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi
--- End diff --

What would be the else block?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [scripts] resolve base path of symlinked execu...

2015-08-25 Thread uce

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/1049#issuecomment-134592174
  
Corresponding issue in order to track fixed issues for the upcoming 
release: https://issues.apache.org/jira/browse/FLINK-2572


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys

2015-08-25 Thread zentol

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134618042
  
@StephanEwen I've reimplemented hashCode() and compare() accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2565) Support primitive arrays as keys


[ 
https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711416#comment-14711416
 ] 

ASF GitHub Bot commented on FLINK-2565:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134618042
  
@StephanEwen I've reimplemented hashCode() and compare() accordingly.


 Support primitive arrays as keys
 

 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2538) Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()


 [ 
https://issues.apache.org/jira/browse/FLINK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2538:
---
Affects Version/s: master

 Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()
 ---

 Key: FLINK-2538
 URL: https://issues.apache.org/jira/browse/FLINK-2538
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ted Yu
Priority: Minor
 Fix For: 0.10


 In ClassLoaderUtil#getUserCodeClassLoaderInfo() around line 76:
 {code}
   else {
 try {
   new JarFile(filePath);
   bld.append( (valid JAR));
 }
 catch (Exception e) {
   bld.append( (invalid JAR: 
 ).append(e.getMessage()).append(')');
 }
   }
 {code}
 The JarFile isn't closed before returning, leading to potential resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes


[ 
https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711427#comment-14711427
 ] 

ASF GitHub Bot commented on FLINK-2569:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/1053#issuecomment-134621653
  
Looks good to merge :+1: 


 CsvReader support for ValueTypes
 

 Key: FLINK-2569
 URL: https://issues.apache.org/jira/browse/FLINK-2569
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Greg Hogan
Assignee: Chiwan Park
Priority: Minor

 From the Flink Programming Guide section on Data Sources:
 {quote}
 readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) 
 delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic 
 java types and their Value counterparts as field types.
 {quote}
 When specifying a ValueType, i.e.
 {code}
 CsvReader csvReader = env.readCsvFile(filename);
 csvReader.types(IntValue.class, IntValue.class);
 {code}
 the following error occurs as BasicTypeInfo is specifically requested in 
 CsvReader.types(...).
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
   at org.apache.flink.client.program.Client.run(Client.java:327)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977)
 Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a 
 basic type.
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177)
   at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393)
   at Driver.main(Driver.java:105)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:497)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method


[ 
https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711345#comment-14711345
 ] 

ASF GitHub Bot commented on FLINK-2570:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37869769
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.ReduceNeighborsFunction;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Triplet;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.types.NullValue;
+
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * This implementation is non - iterative. The total number of triangles 
can be determined by performing
+ * a single pass through the graph.
+ */
+public class GSATriangleCount implements
+   GraphAlgorithmLong, NullValue, NullValue, 
DataSetTuple1Integer {
+
+   @Override
+   public DataSetTuple1Integer run(GraphLong, NullValue, NullValue 
input) throws Exception {
+
+   ExecutionEnvironment env = input.getContext();
+
+   // order the edges so that src is always higher than trg
+   DataSetEdgeLong, NullValue edges = input.getEdges()
+   .map(new OrderEdges()).distinct();
--- End diff --

this call to `distinct()` here means that basically if you have 2 edges 
a-b and b-a in the input, and they are both part of a triangle, then you only 
count it once?


 Add a Triangle Count Library Method
 ---

 Key: FLINK-2570
 URL: https://issues.apache.org/jira/browse/FLINK-2570
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.10
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected 
 graph as input and outputs the total number of triangles formed by the 
 graph's edges. 
 The implementation consists of three phases:
 1). Select neighbours with id greater than the current vertex id.
 Gather: no-op
 Sum: create a set out of these neighbours
 Apply: attach the computed values to the vertices
 2). Propagate each received value to neighbours with higher id (again using 
 GSA)
 3). Compute the number of Triangles by verifying if the final vertex contains 
 the sender's id in its list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2543) State handling does not support deserializing classes through the UserCodeClassloader


[ 
https://issues.apache.org/jira/browse/FLINK-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711343#comment-14711343
 ] 

ASF GitHub Bot commented on FLINK-2543:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1048#issuecomment-134600109
  
I made the SerializedThrowable an Exception and removed the `JobFailure` 
message again.


 State handling does not support deserializing classes through the 
 UserCodeClassloader
 -

 Key: FLINK-2543
 URL: https://issues.apache.org/jira/browse/FLINK-2543
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9, 0.10
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Blocker
 Fix For: 0.10, 0.9.1


 The current implementation of the state checkpointing does not support custom 
 classes, because the UserCodeClassLoader is not used to deserialize the state.
 {code}
 Error: java.lang.RuntimeException: Failed to deserialize state handle and 
 setup initial operator state.
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.lang.ClassNotFoundException: 
 com.ottogroup.bi.searchlab.searchsessionizer.OperatorState
 at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:348)
 at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
 at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
 at 
 org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63)
 at 
 org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33)
 at 
 org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83)
 at 
 org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276)
 at 
 org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51)
 at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541)
 {code}
 The issue has been reported by a user: 
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Class-for-state-checkpointing-td2415.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2565) Support primitive arrays as keys


[ 
https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711354#comment-14711354
 ] 

ASF GitHub Bot commented on FLINK-2565:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134603166
  
I think you currently pay quite a performance price for implementing the 
compare methods in such a ways that they delegate to the `ByteComparator` and 
`FloatComparator`. In order to call those, every single byte or float need to 
be boxed.

That can be avoided if the method directly implements the array comparisons 
on the primitive types.


 Support primitive arrays as keys
 

 Key: FLINK-2565
 URL: https://issues.apache.org/jira/browse/FLINK-2565
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37867925
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/utils/TriangleCountData.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.example.utils;
+
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.types.NullValue;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Provides the default data sets used for the Triangle Count example.
--- End diff --

There is no example :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository


[ 
https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711388#comment-14711388
 ] 

ASF GitHub Bot commented on FLINK-2200:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r37874193
  
--- Diff: flink-staging/flink-gelly/pom.xml ---
@@ -37,17 +37,17 @@ under the License.
dependencies
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-java/artifactId
+   artifactIdflink-java${scala.suffix}/artifactId
version${project.version}/version
/dependency
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-clients/artifactId
+   artifactIdflink-clients${scala.suffix}/artifactId
version${project.version}/version
/dependency
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-test-utils/artifactId
+   artifactIdflink-test-utils${scala.suffix}/artifactId
version${project.version}/version
scopetest/scope
/dependency
--- End diff --

I can not comment below this line, but you forgot the `${scala.suffix}` for 
the `flink-optimizer`.


 Flink API with Scala 2.11 - Maven Repository
 

 Key: FLINK-2200
 URL: https://issues.apache.org/jira/browse/FLINK-2200
 Project: Flink
  Issue Type: Wish
  Components: Build System, Scala API
Reporter: Philipp Götze
Assignee: Chiwan Park
Priority: Trivial
  Labels: maven

 It would be nice if you could upload a pre-built version of the Flink API 
 with Scala 2.11 to the maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/885#discussion_r37874193
  
--- Diff: flink-staging/flink-gelly/pom.xml ---
@@ -37,17 +37,17 @@ under the License.
dependencies
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-java/artifactId
+   artifactIdflink-java${scala.suffix}/artifactId
version${project.version}/version
/dependency
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-clients/artifactId
+   artifactIdflink-clients${scala.suffix}/artifactId
version${project.version}/version
/dependency
dependency
groupIdorg.apache.flink/groupId
-   artifactIdflink-test-utils/artifactId
+   artifactIdflink-test-utils${scala.suffix}/artifactId
version${project.version}/version
scopetest/scope
/dependency
--- End diff --

I can not comment below this line, but you forgot the `${scala.suffix}` for 
the `flink-optimizer`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-2538) Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()


 [ 
https://issues.apache.org/jira/browse/FLINK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2538:
---
Fix Version/s: 0.10

 Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()
 ---

 Key: FLINK-2538
 URL: https://issues.apache.org/jira/browse/FLINK-2538
 Project: Flink
  Issue Type: Bug
Affects Versions: master
Reporter: Ted Yu
Priority: Minor
 Fix For: 0.10


 In ClassLoaderUtil#getUserCodeClassLoaderInfo() around line 76:
 {code}
   else {
 try {
   new JarFile(filePath);
   bld.append( (valid JAR));
 }
 catch (Exception e) {
   bld.append( (invalid JAR: 
 ).append(e.getMessage()).append(')');
 }
   }
 {code}
 The JarFile isn't closed before returning, leading to potential resource leak.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException


 [ 
https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1989:
---
Fix Version/s: (was: 0.9)
   0.10

 Sorting of POJO data set from TableEnv yields NotSerializableException
 --

 Key: FLINK-1989
 URL: https://issues.apache.org/jira/browse/FLINK-1989
 Project: Flink
  Issue Type: Bug
  Components: Table API
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Aljoscha Krettek
 Fix For: 0.10


 Sorting or grouping (or probably any other key operation) on a POJO data set 
 that was created by a {{TableEnvironment}} yields a 
 {{NotSerializableException}} due to a non-serializable 
 {{java.lang.reflect.Field}} object. 
 I traced the error back to the {{ExpressionSelectFunction}}. I guess that a 
 {{TypeInformation}} object is stored in the generated user-code function. A 
 {{PojoTypeInfo}} holds Field objects, which cannot be serialized.
 The following test can be pasted into the {{SelectITCase}} and reproduces the 
 problem. 
 {code}
 @Test
 public void testGroupByAfterTable() throws Exception {
   ExecutionEnvironment env = 
 ExecutionEnvironment.getExecutionEnvironment();
   TableEnvironment tableEnv = new TableEnvironment();
   DataSetTuple3Integer, Long, String ds = 
 CollectionDataSets.get3TupleDataSet(env);
   Table in = tableEnv.toTable(ds, a,b,c);
   Table result = in
   .select(a, b, c);
   DataSetABC resultSet = tableEnv.toSet(result, ABC.class);
   resultSet
   .sortPartition(a, Order.DESCENDING)
   .writeAsText(resultPath, 
 FileSystem.WriteMode.OVERWRITE);
   env.execute();
   expected = 1,1,Hi\n + 2,2,Hello\n + 3,2,Hello world\n + 
 4,3,Hello world,  +
   how are you?\n + 5,3,I am fine.\n + 6,3,Luke 
 Skywalker\n + 7,4, +
   Comment#1\n + 8,4,Comment#2\n + 9,4,Comment#3\n + 
 10,4,Comment#4\n + 11,5, +
   Comment#5\n + 12,5,Comment#6\n + 13,5,Comment#7\n 
 + 14,5,Comment#8\n + 15,5, +
   Comment#9\n + 16,6,Comment#10\n + 
 17,6,Comment#11\n + 18,6,Comment#12\n + 19, +
   6,Comment#13\n + 20,6,Comment#14\n + 
 21,6,Comment#15\n;
 }
 public static class ABC {
   public int a;
   public long b;
   public String c;
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2386] Add new Kafka Consumer for Flink ...

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1055#issuecomment-134599517
  
The tests in this pull request might fail because the fixes to the 
BufferBarrier are not backported to 0.9 yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...

2015-08-25 Thread andralungu

Github user andralungu commented on the pull request:

https://github.com/apache/flink/pull/1054#issuecomment-134590137
  
You're right @tillrohrmann. I updated the JIRA to also contain a small 
description :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2555) Hadoop Input/Output Formats are unable to access secured HDFS clusters


[ 
https://issues.apache.org/jira/browse/FLINK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711458#comment-14711458
 ] 

ASF GitHub Bot commented on FLINK-2555:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1038#issuecomment-134632899
  
I've opened another issue for that: 
https://issues.apache.org/jira/browse/FLINK-2573


 Hadoop Input/Output Formats are unable to access secured HDFS clusters
 --

 Key: FLINK-2555
 URL: https://issues.apache.org/jira/browse/FLINK-2555
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9, 0.10
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Critical

 It seems that authentication tokens are not passed correctly to the input 
 format when accessing secured HDFS clusters.
 Exception
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The program 
 execution failed: Failed to submit job b319a28f62855917901cfb67c5457142 
 (Flink Java Job at Thu Aug 20 10:46:41 PDT 2015)
   at org.apache.flink.client.program.Client.run(Client.java:413)
   at org.apache.flink.client.program.Client.run(Client.java:356)
   at org.apache.flink.client.program.Client.run(Client.java:349)
   at 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63)
   at 
 org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789)
   at org.apache.flink.api.java.DataSet.collect(DataSet.java:408)
   at org.apache.flink.api.java.DataSet.print(DataSet.java:1346)
   at de.robertmetzger.WordCount.main(WordCount.java:73)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
   at org.apache.flink.client.program.Client.run(Client.java:315)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290)
   at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:873)
   at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:870)
   at 
 org.apache.flink.runtime.security.SecurityUtils$1.run(SecurityUtils.java:50)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at 
 org.apache.flink.runtime.security.SecurityUtils.runSecured(SecurityUtils.java:47)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:870)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922)
 Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to 
 submit job b319a28f62855917901cfb67c5457142 (Flink Java Job at Thu Aug 20 
 10:46:41 PDT 2015)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:594)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at 
 org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100)
   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at

[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method

[
https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711360#comment-14711360
]

ASF GitHub Bot commented on FLINK-2570:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1054#issuecomment-134604410

Thanks for adding this one @andralungu. We really need this algorithm in
the library!
I have a few comments:
- we should clarify what is the expected input graph format and the output.
If I'm not mistaken, it seems you're expecting a directed graph without edge
duplicates and you count triangles ignoring edge direction. Is that correct? I
would add a clear comment in the usage description about that. If we want to do
this even better, we could even add a graph validator for the input.
- I'm not quite sure what happens when the graph has opposite direction
edges, i.e. a-b and b-a, that are both part of a triangle. I would expect
that this triangle would be counted twice, but it seems to me that you're only
counting it once. Is there a reason for that?
- as you've been experimenting with this for a while, could you let us know
how better is this than your vertex-centric version? Is it always the case? If
not, do you think it would make sense to add both implementations in the
library and let the users choose?

Add a Triangle Count Library Method
---

Key: FLINK-2570
URL: https://issues.apache.org/jira/browse/FLINK-2570
Project: Flink
Issue Type: Task
Components: Gelly
Affects Versions: 0.10
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected
graph as input and outputs the total number of triangles formed by the
graph's edges.
The implementation consists of three phases:
1). Select neighbours with id greater than the current vertex id.
Gather: no-op
Sum: create a set out of these neighbours
Apply: attach the computed values to the vertices
2). Propagate each received value to neighbours with higher id (again using
GSA)
3). Compute the number of Triangles by verifying if the final vertex contains
the sender's id in its list.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/1054#issuecomment-134604410
  
Thanks for adding this one @andralungu. We really need this algorithm in 
the library!
I have a few comments:
- we should clarify what is the expected input graph format and the output. 
If I'm not mistaken, it seems you're expecting a directed graph without edge 
duplicates and you count triangles ignoring edge direction. Is that correct? I 
would add a clear comment in the usage description about that. If we want to do 
this even better, we could even add a graph validator for the input.
- I'm not quite sure what happens when the graph has opposite direction 
edges, i.e. a-b and b-a, that are both part of a triangle. I would expect 
that this triangle would be counted twice, but it seems to me that you're only 
counting it once. Is there a reason for that?
- as you've been experimenting with this for a while, could you let us know 
how better is this than your vertex-centric version? Is it always the case? If 
not, do you think it would make sense to add both implementations in the 
library and let the users choose?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments


 [ 
https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-987:
--
Fix Version/s: (was: 0.9)
   0.10

 Extend TypeSerializers and -Comparators to work directly on Memory Segments
 ---

 Key: FLINK-987
 URL: https://issues.apache.org/jira/browse/FLINK-987
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.6-incubating
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
 Fix For: 0.10


 As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to 
 change the way that the TypeSerialzers/Comparators and 
 DataInputViews/DataOutputViews work.
 The goal is to allow more flexibility in the construction on the binary 
 representation of data types, and to allow partial deserialization of 
 individual fields. Both is currently prohibited by the fact that the 
 abstraction of the memory (into which the data goes) is a stream abstraction 
 ({{DataInputView}}, {{DataOutputView}}).
 An idea is to offer a random-access buffer like view for construction and 
 random-access deserialization, as well as various methods to copy elements in 
 a binary fashion between such buffers and streams.
 A possible set of methods for the {{TypeSerializer}} could be:
 {code}
 long serialize(T record, TargetBuffer buffer);
   
 T deserialize(T reuse, SourceBuffer source);
   
 void ensureBufferSufficientlyFilled(SourceBuffer source);
   
 X X deserializeField(X reuse, int logicalPos, SourceBuffer buffer);
   
 int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer);
   
 void copy(DataInputView in, TargetBuffer buffer);
   
 void copy(SourceBuffer buffer,, DataOutputView out);
   
 void copy(DataInputView source, DataOutputView target);
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1610) Java docs do not build


 [ 
https://issues.apache.org/jira/browse/FLINK-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1610:
---
Fix Version/s: (was: 0.9)
   0.10

 Java docs do not build
 --

 Key: FLINK-1610
 URL: https://issues.apache.org/jira/browse/FLINK-1610
 Project: Flink
  Issue Type: Bug
  Components: Build System, Documentation
Affects Versions: 0.9
Reporter: Maximilian Michels
 Fix For: 0.10


 Among a bunch of warnings, I get the following error which prevents the java 
 doc generation from finishing:
 {code}
 javadoc: error - 
 com.sun.tools.doclets.internal.toolkit.util.DocletAbortException: 
 com.sun.tools.javac.code.Symbol$CompletionFailure: class file for 
 akka.testkit.TestKit not found
 Command line was: 
 /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/bin/javadoc
  -Xdoclint:none @options @packages
   at 
 org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeJavadocCommandLine(AbstractJavadocMojo.java:5074)
   at 
 org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeReport(AbstractJavadocMojo.java:1999)
   at 
 org.apache.maven.plugin.javadoc.JavadocReport.generate(JavadocReport.java:130)
   at 
 org.apache.maven.plugin.javadoc.JavadocReport.execute(JavadocReport.java:315)
   at 
 org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
   at 
 org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
   at 
 org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
   at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1778) Improve normalized keys in composite key case

2015-08-25 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711362#comment-14711362
 ] 

Stephan Ewen commented on FLINK-1778:
-

Has not been addressed

 Improve normalized keys in composite key case
 -

 Key: FLINK-1778
 URL: https://issues.apache.org/jira/browse/FLINK-1778
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 Currently, if we have a key (String, long), the String will take up the 
 entire normalized key space, without being fully discerning anyways. Limiting 
 the key prefix in size and giving space to the second key field should in 
 most cases improve the comparison efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1778) Improve normalized keys in composite key case


 [ 
https://issues.apache.org/jira/browse/FLINK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1778:
---
Fix Version/s: (was: 0.9)
   0.10

 Improve normalized keys in composite key case
 -

 Key: FLINK-1778
 URL: https://issues.apache.org/jira/browse/FLINK-1778
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10


 Currently, if we have a key (String, long), the String will take up the 
 entire normalized key space, without being fully discerning anyways. Limiting 
 the key prefix in size and giving space to the second key field should in 
 most cases improve the comparison efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/992#discussion_r37879401
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java
 ---
@@ -42,11 +42,13 @@
private boolean retryForever;
private Socket socket;
private static final int CONNECTION_TIMEOUT_TIME = 0;
-   private static final int CONNECTION_RETRY_SLEEP = 1000;
+   public static int CONNECTION_RETRY_SLEEP = 1000;
--- End diff --

This shouldn't be modifiable by everyone. Please make it just 
package-visible by removing the `public` modifier. Also, please keep the 
`final` modifier because the current implementation just lets the number of 
retries be configurable with a fixed 1 second retry rate. This is also 
documented in the user-facing API methods on DataStream.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results

[
https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ufuk Celebi updated FLINK-1297:
---
Fix Version/s: (was: 0.9)
0.10

Add support for tracking statistics of intermediate results
---

Key: FLINK-1297
URL: https://issues.apache.org/jira/browse/FLINK-1297
Project: Flink
Issue Type: Improvement
Components: Distributed Runtime
Reporter: Alexander Alexandrov
Assignee: Alexander Alexandrov
Fix For: 0.10

Original Estimate: 1,008h
Remaining Estimate: 1,008h

One of the major problems related to the optimizer at the moment is the lack
of proper statistics.
With the introduction of staged execution, it is possible to instrument the
runtime code with a statistics facility that collects the required
information for optimizing the next execution stage.
I would therefore like to contribute code that can be used to gather basic
statistics for the (intermediate) result of dataflows (e.g. min, max, count,
count distinct) and make them available to the job manager.
Before I start, I would like to hear some feedback form the other users.
In particular, to handle skew (e.g. on grouping) it might be good to have
some sort of detailed sketch about the key distribution of an intermediate
result. I am not sure whether a simple histogram is the most effective way to
go. Maybe somebody would propose another lightweight sketch that provides
better accuracy.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Closed] (FLINK-2528) DriverTestBase tests fail spuriously

2015-08-25 Thread Stephan Ewen (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-2528.
---

 DriverTestBase tests fail spuriously
 

 Key: FLINK-2528
 URL: https://issues.apache.org/jira/browse/FLINK-2528
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
Assignee: Stephan Ewen
 Fix For: 0.10


 MatchTaskTest fails with a Null Pointer exception. Here's the log: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/75780253/log.txt
 and the relevant parts of trace:
 {code}
 Exception in thread Thread-154 java.lang.AssertionError: Canceling task 
 failed: java.lang.NullPointerException
   at 
 org.apache.flink.runtime.operators.testutils.DriverTestBase.cancel(DriverTestBase.java:271)
   at 
 org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:60)
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:68)
 {code}
 {code}
 Thread-153 prio=10 tid=0x7fc1e1338800 nid=0x5cd6 waiting on condition 
 [0x7fc1d2b1]
java.lang.Thread.State: TIMED_WAITING (sleeping)
   at java.lang.Thread.sleep(Native Method)
   at 
 org.apache.flink.runtime.operators.MatchTaskTest$MockDelayingMatchStub.join(MatchTaskTest.java:984)
   at 
 org.apache.flink.runtime.operators.MatchTaskTest$MockDelayingMatchStub.join(MatchTaskTest.java:978)
   at 
 org.apache.flink.runtime.operators.sort.AbstractMergeIterator.crossMwithNValues(AbstractMergeIterator.java:297)
   at 
 org.apache.flink.runtime.operators.sort.AbstractMergeIterator.crossMatchingGroup(AbstractMergeIterator.java:146)
   at 
 org.apache.flink.runtime.operators.sort.AbstractMergeInnerJoinIterator.callWithNextKey(AbstractMergeInnerJoinIterator.java:105)
   at 
 org.apache.flink.runtime.operators.JoinDriver.run(JoinDriver.java:208)
   at 
 org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriverInternal(DriverTestBase.java:208)
   at 
 org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriver(DriverTestBase.java:174)
   at 
 org.apache.flink.runtime.operators.MatchTaskTest$3.run(MatchTaskTest.java:520)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method


[ 
https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711338#comment-14711338
 ] 

ASF GitHub Bot commented on FLINK-2570:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37869349
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.ReduceNeighborsFunction;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Triplet;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.types.NullValue;
+
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * This implementation is non - iterative. The total number of triangles 
can be determined by performing
+ * a single pass through the graph.
--- End diff --

The implementation is not iterative indeed, but I wouldn't call it 
single-pass. Single-pass means that each edge/vertex of the graph is only 
read and processed once. That would be the case, e.g. in a streaming 
implementation. I would simply remove the 2nd sentence here :)


 Add a Triangle Count Library Method
 ---

 Key: FLINK-2570
 URL: https://issues.apache.org/jira/browse/FLINK-2570
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.10
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected 
 graph as input and outputs the total number of triangles formed by the 
 graph's edges. 
 The implementation consists of three phases:
 1). Select neighbours with id greater than the current vertex id.
 Gather: no-op
 Sum: create a set out of these neighbours
 Apply: attach the computed values to the vertices
 2). Propagate each received value to neighbours with higher id (again using 
 GSA)
 3). Compute the number of Triangles by verifying if the final vertex contains 
 the sender's id in its list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37869349
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.ReduceNeighborsFunction;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Triplet;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.types.NullValue;
+
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * This implementation is non - iterative. The total number of triangles 
can be determined by performing
+ * a single pass through the graph.
--- End diff --

The implementation is not iterative indeed, but I wouldn't call it 
single-pass. Single-pass means that each edge/vertex of the graph is only 
read and processed once. That would be the case, e.g. in a streaming 
implementation. I would simply remove the 2nd sentence here :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134603166
  
I think you currently pay quite a performance price for implementing the 
compare methods in such a ways that they delegate to the `ByteComparator` and 
`FloatComparator`. In order to call those, every single byte or float need to 
be boxed.

That can be avoided if the method directly implements the array comparisons 
on the primitive types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37869769
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java
 ---
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.library;
+
+
+import org.apache.flink.api.common.functions.MapFunction;
+import org.apache.flink.api.common.functions.ReduceFunction;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.api.java.tuple.Tuple1;
+import org.apache.flink.api.java.tuple.Tuple2;
+import org.apache.flink.graph.GraphAlgorithm;
+import org.apache.flink.graph.ReduceNeighborsFunction;
+import org.apache.flink.graph.Vertex;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.graph.Triplet;
+import org.apache.flink.graph.EdgeDirection;
+import org.apache.flink.graph.Graph;
+import org.apache.flink.types.NullValue;
+
+import java.util.TreeMap;
+
+/**
+ * Triangle Count Algorithm.
+ *
+ * This algorithm operates in three phases. First, vertices select 
neighbors with id greater than theirs
+ * and send messages to them. Each received message is then propagated to 
neighbors with higher id.
+ * Finally, if a node encounters the target id in the list of received 
messages, it increments the number
+ * of triangles found.
+ *
+ * This implementation is non - iterative. The total number of triangles 
can be determined by performing
+ * a single pass through the graph.
+ */
+public class GSATriangleCount implements
+   GraphAlgorithmLong, NullValue, NullValue, 
DataSetTuple1Integer {
+
+   @Override
+   public DataSetTuple1Integer run(GraphLong, NullValue, NullValue 
input) throws Exception {
+
+   ExecutionEnvironment env = input.getContext();
+
+   // order the edges so that src is always higher than trg
+   DataSetEdgeLong, NullValue edges = input.getEdges()
+   .map(new OrderEdges()).distinct();
--- End diff --

this call to `distinct()` here means that basically if you have 2 edges 
a-b and b-a in the input, and they are both part of a triangle, then you only 
count it once?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Reopened] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs


 [ 
https://issues.apache.org/jira/browse/FLINK-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi reopened FLINK-2427:


I've reopened this issue to track back porting this for 0.9.1.

 Allow the BarrierBuffer to maintain multiple queues of blocked inputs
 -

 Key: FLINK-2427
 URL: https://issues.apache.org/jira/browse/FLINK-2427
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10, 0.9.1


 In corner cases (dropped barriers due to failures/startup races), this is 
 required for proper operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs


 [ 
https://issues.apache.org/jira/browse/FLINK-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2427:
---
Fix Version/s: 0.9.1

 Allow the BarrierBuffer to maintain multiple queues of blocked inputs
 -

 Key: FLINK-2427
 URL: https://issues.apache.org/jira/browse/FLINK-2427
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.10
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.10, 0.9.1


 In corner cases (dropped barriers due to failures/startup races), this is 
 required for proper operation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1129) The Plan Visualizer Cuts of the Lower Part of Certain Operators


[ 
https://issues.apache.org/jira/browse/FLINK-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711309#comment-14711309
 ] 

Ufuk Celebi commented on FLINK-1129:


This has been fixed with the new runtime interface for 0.10. I don't think that 
it will be back ported to 0.9.1. If someone has time to do this, please change 
the fix version accordingly and port it back.

 The Plan Visualizer Cuts of the Lower Part of Certain Operators
 ---

 Key: FLINK-1129
 URL: https://issues.apache.org/jira/browse/FLINK-1129
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.10

 Attachments: screenshot-1.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API


[ 
https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711340#comment-14711340
 ] 

ASF GitHub Bot commented on FLINK-2386:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/1055#issuecomment-134599517
  
The tests in this pull request might fail because the fixes to the 
BufferBarrier are not backported to 0.9 yet.


 Implement Kafka connector using the new Kafka Consumer API
 --

 Key: FLINK-2386
 URL: https://issues.apache.org/jira/browse/FLINK-2386
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.10, 0.9.1


 Once Kafka has released its new consumer API, we should provide a connector 
 for that version.
 The release will probably be called 0.9 or 0.8.3.
 The connector will be mostly compatible with Kafka 0.8.2.x, except for 
 committing offsets to the broker (the new connector expects a coordinator to 
 be available on Kafka). To work around that, we can provide a configuration 
 option to commit offsets to zookeeper (managed by flink code).
 For 0.9/0.8.3 it will be fully compatible.
 It will not be compatible with 0.8.1 because of mismatching Kafka messages.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2542) It should be documented that it is required from a join key to override hashCode(), when it is not a POJO


 [ 
https://issues.apache.org/jira/browse/FLINK-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-2542.

Resolution: Won't Fix

I think it's OK to assume that people follow the general Object contract:
{code}
Note that it is generally necessary to override the {@code hashCode} method 
whenever this method is overridden, so as to maintain the general contract for 
the {@code hashCode} method, which states that equal objects must have equal 
hash codes.
{code}

If more people run into this, we can revisit this issue.

 It should be documented that it is required from a join key to override 
 hashCode(), when it is not a POJO
 -

 Key: FLINK-2542
 URL: https://issues.apache.org/jira/browse/FLINK-2542
 Project: Flink
  Issue Type: Bug
  Components: Gelly, Java API
Reporter: Gabor Gevay
Priority: Minor
 Fix For: 0.10, 0.9.1


 If the join key is not a POJO, and does not override hashCode, then the join 
 silently fails (produces empty output). I don't see this documented anywhere.
 The Gelly documentation should also have this info separately, because it 
 does joins internally on the vertex IDs, but the user might not know this, or 
 might not look at the join documentation when using Gelly.
 Here is an example code:
 {noformat}
 public static class ID implements ComparableID {
   public long foo;
   //no default ctor -- not a POJO
   public ID(long foo) {
   this.foo = foo;
   }
   @Override
   public int compareTo(ID o) {
   return ((Long)foo).compareTo(o.foo);
   }
   @Override
   public boolean equals(Object o0) {
   if(o0 instanceof ID) {
   ID o = (ID)o0;
   return foo == o.foo;
   } else {
   return false;
   }
   }
   @Override
   public int hashCode() {
   return 42;
   }
 }
 public static void main(String[] args) throws Exception {
   ExecutionEnvironment env = 
 ExecutionEnvironment.getExecutionEnvironment();
   DataSetTuple2ID, Long inDegrees = env.fromElements(Tuple2.of(new 
 ID(123l), 4l));
   DataSetTuple2ID, Long outDegrees = env.fromElements(Tuple2.of(new 
 ID(123l), 5l));
   DataSetTuple3ID, Long, Long degrees = inDegrees.join(outDegrees, 
 JoinOperatorBase.JoinHint.REPARTITION_HASH_FIRST).where(0).equalTo(0)
   .with(new FlatJoinFunctionTuple2ID, Long, Tuple2ID, 
 Long, Tuple3ID, Long, Long() {
   @Override
   public void join(Tuple2ID, Long first, 
 Tuple2ID, Long second, CollectorTuple3ID, Long, Long out) {
   out.collect(new Tuple3ID, Long, 
 Long(first.f0, first.f1, second.f1));
   }
   
 }).withForwardedFieldsFirst(f0;f1).withForwardedFieldsSecond(f1);
   System.out.println(degrees count:  + degrees.count());
 }
 {noformat}
 This prints 1, but if I comment out the hashCode, it prints 0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket


[ 
https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711439#comment-14711439
 ] 

ASF GitHub Bot commented on FLINK-2490:
---

Github user mxm commented on a diff in the pull request:

https://github.com/apache/flink/pull/992#discussion_r37879401
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java
 ---
@@ -42,11 +42,13 @@
private boolean retryForever;
private Socket socket;
private static final int CONNECTION_TIMEOUT_TIME = 0;
-   private static final int CONNECTION_RETRY_SLEEP = 1000;
+   public static int CONNECTION_RETRY_SLEEP = 1000;
--- End diff --

This shouldn't be modifiable by everyone. Please make it just 
package-visible by removing the `public` modifier. Also, please keep the 
`final` modifier because the current implementation just lets the number of 
retries be configurable with a fixed 1 second retry rate. This is also 
documented in the user-facing API methods on DataStream.


 Remove unwanted boolean check in function 
 SocketTextStreamFunction.streamFromSocket
 ---

 Key: FLINK-2490
 URL: https://issues.apache.org/jira/browse/FLINK-2490
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
Priority: Minor
 Fix For: 0.10

   Original Estimate: 168h
  Remaining Estimate: 168h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2097) Add support for JobSessions


 [ 
https://issues.apache.org/jira/browse/FLINK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2097:
---
Fix Version/s: (was: 0.9)
   0.10

 Add support for JobSessions
 ---

 Key: FLINK-2097
 URL: https://issues.apache.org/jira/browse/FLINK-2097
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Maximilian Michels
 Fix For: 0.10


 Sessions make sure that the JobManager does not immediately discard a 
 JobGraph after execution, but keeps it around for further operations to be 
 attached to the graph. By keeping the JobGraph around, the cached streams 
 (intermediate data) are also kept,
 That is the way of realizing interactive sessions on top of a streaming 
 dataflow abstraction.
 ExecutionGraphs should be kept as long as
 - no timeout occurred or
 - the session has not been explicitly ended



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1129) The Plan Visualizer Cuts of the Lower Part of Certain Operators


 [ 
https://issues.apache.org/jira/browse/FLINK-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1129:
---
Fix Version/s: (was: 0.9)
   0.10

 The Plan Visualizer Cuts of the Lower Part of Certain Operators
 ---

 Key: FLINK-1129
 URL: https://issues.apache.org/jira/browse/FLINK-1129
 Project: Flink
  Issue Type: Bug
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.10

 Attachments: screenshot-1.png






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method


[ 
https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711308#comment-14711308
 ] 

ASF GitHub Bot commented on FLINK-2570:
---

Github user vasia commented on a diff in the pull request:

https://github.com/apache/flink/pull/1054#discussion_r37867925
  
--- Diff: 
flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/utils/TriangleCountData.java
 ---
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * License); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.graph.example.utils;
+
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.api.java.ExecutionEnvironment;
+import org.apache.flink.graph.Edge;
+import org.apache.flink.types.NullValue;
+
+import java.util.ArrayList;
+import java.util.List;
+
+/**
+ * Provides the default data sets used for the Triangle Count example.
--- End diff --

There is no example :)


 Add a Triangle Count Library Method
 ---

 Key: FLINK-2570
 URL: https://issues.apache.org/jira/browse/FLINK-2570
 Project: Flink
  Issue Type: Task
  Components: Gelly
Affects Versions: 0.10
Reporter: Andra Lungu
Assignee: Andra Lungu
Priority: Minor

 The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected 
 graph as input and outputs the total number of triangles formed by the 
 graph's edges. 
 The implementation consists of three phases:
 1). Select neighbours with id greater than the current vertex id.
 Gather: no-op
 Sum: create a set out of these neighbours
 Apply: attach the computed values to the vertices
 2). Propagate each received value to neighbours with higher id (again using 
 GSA)
 3). Compute the number of Triangles by verifying if the final vertex contains 
 the sender's id in its list. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1240) We cannot use sortGroup on a global reduce


 [ 
https://issues.apache.org/jira/browse/FLINK-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1240:
---
Fix Version/s: (was: 0.9)
   0.10

 We cannot use sortGroup on a global reduce
 --

 Key: FLINK-1240
 URL: https://issues.apache.org/jira/browse/FLINK-1240
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Priority: Minor
 Fix For: 0.10


 This is only an API problem, I hope.
 I also know, that this is potentially a very bad idea because everything must 
 be sorted on one node. In some cases, such as sorted first-n this would make 
 sense, though, since there we use a combiner.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1278) Remove the Record special code paths


 [ 
https://issues.apache.org/jira/browse/FLINK-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1278:
---
Assignee: (was: Kostas Tzoumas)

 Remove the Record special code paths
 

 Key: FLINK-1278
 URL: https://issues.apache.org/jira/browse/FLINK-1278
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.8.0
Reporter: Stephan Ewen
Priority: Minor
 Fix For: 0.10


 There are some legacy Record code paths in the runtime, which are often 
 forgotten to be kept in sync and cause errors if people actually use records.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1851) Java Table API does not support Casting


 [ 
https://issues.apache.org/jira/browse/FLINK-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1851:
---
Fix Version/s: (was: 0.9)
   0.10

 Java Table API does not support Casting
 ---

 Key: FLINK-1851
 URL: https://issues.apache.org/jira/browse/FLINK-1851
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.10






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository


[ 
https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711430#comment-14711430
 ] 

ASF GitHub Bot commented on FLINK-2200:
---

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134623006
  
@rmetzger Thanks! I addressed your comment and rebased on master.


 Flink API with Scala 2.11 - Maven Repository
 

 Key: FLINK-2200
 URL: https://issues.apache.org/jira/browse/FLINK-2200
 Project: Flink
  Issue Type: Wish
  Components: Build System, Scala API
Reporter: Philipp Götze
Assignee: Chiwan Park
Priority: Trivial
  Labels: maven

 It would be nice if you could upload a pre-built version of the Flink API 
 with Scala 2.11 to the maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository


[ 
https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711432#comment-14711432
 ] 

ASF GitHub Bot commented on FLINK-2200:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134624663
  
Cool, that was quick ;)


 Flink API with Scala 2.11 - Maven Repository
 

 Key: FLINK-2200
 URL: https://issues.apache.org/jira/browse/FLINK-2200
 Project: Flink
  Issue Type: Wish
  Components: Build System, Scala API
Reporter: Philipp Götze
Assignee: Chiwan Park
Priority: Trivial
  Labels: maven

 It would be nice if you could upload a pre-built version of the Flink API 
 with Scala 2.11 to the maven repository.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...

2015-08-25 Thread chiwanpark

Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/885#issuecomment-134623006
  
@rmetzger Thanks! I addressed your comment and rebased on master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...

2015-08-25 Thread chiwanpark

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37884384
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

@tillrohrmann Yeah, I know that. But to use `collect` instead of writing to 
disk, we need to change all test methods in `CsvReaderITCase`. Maybe we can 
cover this in other issue (FLINK-2032).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes


[ 
https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711534#comment-14711534
 ] 

ASF GitHub Bot commented on FLINK-2569:
---

Github user chiwanpark commented on a diff in the pull request:

https://github.com/apache/flink/pull/1053#discussion_r37884384
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java ---
@@ -123,6 +132,21 @@ public void 
testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception {
expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00;
}
 
+   @Test
+   public void testValueTypes() throws Exception {
+   final String inputData = 
ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0;
+   final String dataPath = createInputData(inputData);
+   final ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment();
+
+   DataSetTuple8StringValue, BooleanValue, ByteValue, 
ShortValue, IntValue, LongValue, FloatValue, DoubleValue data =
+   
env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, 
ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, 
FloatValue.class, DoubleValue.class);
+   data.writeAsText(resultPath);
--- End diff --

@tillrohrmann Yeah, I know that. But to use `collect` instead of writing to 
disk, we need to change all test methods in `CsvReaderITCase`. Maybe we can 
cover this in other issue (FLINK-2032).


 CsvReader support for ValueTypes
 

 Key: FLINK-2569
 URL: https://issues.apache.org/jira/browse/FLINK-2569
 Project: Flink
  Issue Type: Improvement
  Components: Java API
Affects Versions: 0.9
Reporter: Greg Hogan
Assignee: Chiwan Park
Priority: Minor

 From the Flink Programming Guide section on Data Sources:
 {quote}
 readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) 
 delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic 
 java types and their Value counterparts as field types.
 {quote}
 When specifying a ValueType, i.e.
 {code}
 CsvReader csvReader = env.readCsvFile(filename);
 csvReader.types(IntValue.class, IntValue.class);
 {code}
 the following error occurs as BasicTypeInfo is specifically requested in 
 CsvReader.types(...).
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The main method 
 caused an error.
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
   at 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
   at org.apache.flink.client.program.Client.run(Client.java:327)
   at 
 org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608)
   at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977)
 Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a 
 basic type.
   at 
 org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177)
   at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393)
   at Driver.main(Driver.java:105)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:497)
   at 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
   ... 6 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1426) JobManager AJAX requests sometimes fail


 [ 
https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1426.

Resolution: Invalid

This is superseded by the new web front end. I didn't see any recent progress 
in the repo you've linked. If this is still ongoing, we will have to sync this 
with the ongoing progress to refactor the web interface.

 JobManager AJAX requests sometimes fail
 ---

 Key: FLINK-1426
 URL: https://issues.apache.org/jira/browse/FLINK-1426
 Project: Flink
  Issue Type: Bug
  Components: JobManager, Webfrontend
Reporter: Robert Metzger

 It seems that the JobManager sometimes (I think when accessing it the first 
 time) does not show the number of TMs / slots.
 A simple workaround is re-loading it, but still, users are complaining about 
 it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1639) Document the Flink deployment scripts to make sure others know how to make release


 [ 
https://issues.apache.org/jira/browse/FLINK-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1639.

Resolution: Fixed
  Assignee: (was: Márton Balassi)

Documentation has been added in 
https://cwiki.apache.org/confluence/display/FLINK/Releasing

Max also added more comments to the script itself.

 Document the Flink deployment scripts to make sure others know how to make 
 release
 --

 Key: FLINK-1639
 URL: https://issues.apache.org/jira/browse/FLINK-1639
 Project: Flink
  Issue Type: Task
  Components: release
Reporter: Henry Saputra

 Currently, Robert knows the detail about Flink deployment and release scripts 
 to support both Hadoop versions.
 Need to document details black magic used in the scripts to make sure other 
 knows how the flow work just in case we need to push release and Robert is 
 not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2504) ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed spuriously


 [ 
https://issues.apache.org/jira/browse/FLINK-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2504:
---
Labels: test-stability  (was: )

 ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed 
 spuriously
 -

 Key: FLINK-2504
 URL: https://issues.apache.org/jira/browse/FLINK-2504
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
  Labels: test-stability

 The test 
 {{ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed}} 
 failed in one of my Travis builds: 
 https://travis-ci.org/tillrohrmann/flink/jobs/74881883



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37881554
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.util;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.core.io.InputSplit;
+import org.apache.flink.runtime.operators.testutils.MockEnvironment;
+import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider;
+import org.apache.flink.runtime.state.LocalStateHandle;
+import org.apache.flink.streaming.api.functions.source.FileSourceFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+import org.apache.flink.types.IntValue;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+public class FileSourceFunctionTest {
+   @Test
+   public void testFileSourceFunction() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.run(ctx);
+   } catch (Exception e) {
+   e.printStackTrace();
+   }
+   Assert.assertTrue(ctx.getData().size() == 200);
+   }
+
+   @Test
+   public void testFileSourceFunctionCheckpoint() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.restoreState(100:1);
+

[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent


[ 
https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711480#comment-14711480
 ] 

ASF GitHub Bot commented on FLINK-2314:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37881537
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.util;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.core.io.InputSplit;
+import org.apache.flink.runtime.operators.testutils.MockEnvironment;
+import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider;
+import org.apache.flink.runtime.state.LocalStateHandle;
+import org.apache.flink.streaming.api.functions.source.FileSourceFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+import org.apache.flink.types.IntValue;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+public class FileSourceFunctionTest {
+   @Test
+   public void testFileSourceFunction() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.run(ctx);
+   } catch (Exception e) {
+   e.printStackTrace();
+   }
+   Assert.assertTrue(ctx.getData().size() == 200);
+   }
+
+   @Test
+   public void testFileSourceFunctionCheckpoint() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+

[jira] [Commented] (FLINK-2480) Improving tests coverage for org.apache.flink.streaming.api


[ 
https://issues.apache.org/jira/browse/FLINK-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711469#comment-14711469
 ] 

ASF GitHub Bot commented on FLINK-2480:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/991#issuecomment-134635720
  
@HuangWHWHW Can you access the CI reports now? Has the Travis team fixed 
the problem?


 Improving tests coverage for org.apache.flink.streaming.api
 ---

 Key: FLINK-2480
 URL: https://issues.apache.org/jira/browse/FLINK-2480
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.10
Reporter: Huang Wei
 Fix For: 0.10

   Original Estimate: 504h
  Remaining Estimate: 504h

 The streaming API is quite a bit newer than the other code so it is not that 
 well covered with tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2480][test]Add tests for PrintSinkFunct...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/991#issuecomment-134635720
  
@HuangWHWHW Can you access the CI reports now? Has the Travis team fixed 
the problem?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers


[ 
https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711513#comment-14711513
 ] 

ASF GitHub Bot commented on FLINK-2291:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/1016#issuecomment-134647030
  
+1 to merge, we should follow up on the Mini cluster and Curator shading 
separately


 Use ZooKeeper to elect JobManager leader and send information to TaskManagers
 -

 Key: FLINK-2291
 URL: https://issues.apache.org/jira/browse/FLINK-2291
 Project: Flink
  Issue Type: Sub-task
  Components: JobManager, TaskManager
Affects Versions: 0.10
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 0.10


 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will 
 act as the responsible {{JobManager}} for all {{TaskManager}}. The 
 {{TaskManager}} will get the address of the leader from ZooKeeper.
 Related Wiki: 
 [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1158) Logging property files missing in project created by archetypes


 [ 
https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1158.

   Resolution: Fixed
Fix Version/s: 0.9

The current archetypes have a logging property file.

 Logging property files missing in project created by archetypes
 ---

 Key: FLINK-1158
 URL: https://issues.apache.org/jira/browse/FLINK-1158
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.7.0-incubating
Reporter: Till Rohrmann
 Fix For: 0.9


 If one creates a flink project using the archetypes, then there are no 
 predefined logging properties files. Would be very convenient for the user to 
 have them generated.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys

2015-08-25 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/1043#issuecomment-134665239
  
Thanks! Good to merge, IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-1929) Add code to cleanly stop a running streaming topology


 [ 
https://issues.apache.org/jira/browse/FLINK-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi resolved FLINK-1929.

Resolution: Fixed

 Add code to cleanly stop a running streaming topology
 -

 Key: FLINK-1929
 URL: https://issues.apache.org/jira/browse/FLINK-1929
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Robert Metzger

 Right now its not possible to cleanly stop a running Streaming topology.
 Cancelling the job will cancel all operators, but for proper exactly once 
 processing from Kafka sources, we need to provide a way to stop the sources 
 first, wait until all remaining tuples have been processed and then shut down 
 the sources (so that they can commit the right offset to Zookeeper).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-2276) Travis build error


 [ 
https://issues.apache.org/jira/browse/FLINK-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-2276:
---
Labels: test-stability  (was: )

 Travis build error
 --

 Key: FLINK-2276
 URL: https://issues.apache.org/jira/browse/FLINK-2276
 Project: Flink
  Issue Type: Bug
Reporter: Sachin Goel
  Labels: test-stability

 testExecutionFailsAfterTaskMarkedFailed on travis. 
 Here is the log output: 
 https://s3.amazonaws.com/archive.travis-ci.org/jobs/68288986/log.txt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-2189) NullPointerException in MutableHashTable

2015-08-25 Thread Till Rohrmann (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711620#comment-14711620
 ] 

Till Rohrmann commented on FLINK-2189:
--

[~JonathanH5] encountered this problem recently.

 NullPointerException in MutableHashTable
 

 Key: FLINK-2189
 URL: https://issues.apache.org/jira/browse/FLINK-2189
 Project: Flink
  Issue Type: Bug
  Components: Core
Reporter: Till Rohrmann

 [~Felix Neutatz] reported a {{NullPointerException}} in the 
 {{MutableHashTable}} when running the {{ALS}} algorithm. The stack trace is 
 the following:
 {code}
 Caused by: java.lang.NullPointerException
   at 
 org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1094)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:927)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:783)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544)
   at 
 org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104)
   at 
 org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
   at java.lang.Thread.run(Thread.java:745)
 {code}
 He produced this error on his local machine with the following code:
 {code}
 implicit val env = ExecutionEnvironment.getExecutionEnvironment
 val links = MovieLensUtils.readLinks(movieLensDir + links.csv)
 val movies = MovieLensUtils.readMovies(movieLensDir + movies.csv)
 val ratings = MovieLensUtils.readRatings(movieLensDir + ratings.csv)
 val tags = MovieLensUtils.readTags(movieLensDir + tags.csv)
   
 val ratingMatrix =  ratings.map { r = (r.userId.toInt, r.movieId.toInt, 
 r.rating) }
 val testMatrix =  ratings.map { r = (r.userId.toInt, r.movieId.toInt) }
 val als = ALS()
.setIterations(10)
.setNumFactors(10)
.setBlocks(150) 
  
 als.fit(ratingMatrix)
 val result = als.predict(testMatrix)
  
 result.print
 val risk = als.empiricalRisk(ratingMatrix).collect().apply(0)
 println(Empirical risk:  + risk) 
 env.execute()
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/997#discussion_r37881478
  
--- Diff: 
flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java
 ---
@@ -0,0 +1,208 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.streaming.util;
+
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.common.accumulators.Accumulator;
+import org.apache.flink.api.common.functions.RuntimeContext;
+import org.apache.flink.api.common.io.FileInputFormat;
+import org.apache.flink.api.java.functions.KeySelector;
+import org.apache.flink.api.java.typeutils.TypeExtractor;
+import org.apache.flink.configuration.Configuration;
+import org.apache.flink.core.fs.FileInputSplit;
+import org.apache.flink.core.fs.Path;
+import org.apache.flink.core.io.InputSplit;
+import org.apache.flink.runtime.operators.testutils.MockEnvironment;
+import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider;
+import org.apache.flink.runtime.state.LocalStateHandle;
+import org.apache.flink.streaming.api.functions.source.FileSourceFunction;
+import org.apache.flink.streaming.api.functions.source.SourceFunction;
+import org.apache.flink.streaming.api.watermark.Watermark;
+import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext;
+import org.apache.flink.types.IntValue;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+
+public class FileSourceFunctionTest {
+   @Test
+   public void testFileSourceFunction() {
+   DummyFileInputFormat inputFormat = new DummyFileInputFormat();
+   RuntimeContext runtimeContext = new 
StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024,
+   inputFormat.getDummyInputSplitProvider(), 
1024), null, new ExecutionConfig(), new DummyModKey(2),
+   new 
LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, 
Accumulator?, ?());
+
+   
inputFormat.setFilePath(file:///some/none/existing/directory/);
+   FileSourceFunctionIntValue fileSourceFunction = new 
FileSourceFunctionIntValue(inputFormat, 
TypeExtractor.getInputFormatTypes(inputFormat));
+
+   fileSourceFunction.setRuntimeContext(runtimeContext);
+   DummyContextIntValue ctx = new DummyContextIntValue();
+   try {
+   fileSourceFunction.open(new Configuration());
+   fileSourceFunction.run(ctx);
+   } catch (Exception e) {
+   e.printStackTrace();
--- End diff --

Why do print the stack trace instead of simply letting the exception 
bubbling up? Is this an expected test exception?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/1038#issuecomment-134632899
  
I've opened another issue for that: 
https://issues.apache.org/jira/browse/FLINK-2573


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-2394) HadoopOutFormat OutputCommitter is default to FileOutputCommiter


[ 
https://issues.apache.org/jira/browse/FLINK-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711474#comment-14711474
 ] 

ASF GitHub Bot commented on FLINK-2394:
---

GitHub user fhueske opened a pull request:

https://github.com/apache/flink/pull/1056

[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.

Right now, Flink's wrappers for Hadoop OutputFormats always use a 
`FileOutputCommitter`.

- In the `mapreduce` API, Hadoop OutputFormats have a method 
`getOutputCommitter()` which can be overwritten and returns the 
`FileOutputFormat` by default.
- In the `mapred`API, the `OutputCommitter` should be obtained from the 
`JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned.

This PR uses the respective methods to obtain the correct 
`OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, 
the original semantics are preserved if no custom committer is implemented or 
set by the user.
I also added convenience methods to the constructors of the `mapred` 
wrappers to set the `OutputCommitter` in the `JobConf`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/fhueske/flink hadoopOutCommitter

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/1056.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1056


commit a632203a948f2e7973339a0eab88750f7ce70cc5
Author: Fabian Hueske fhue...@apache.org
Date:   2015-07-30T19:47:01Z

[FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters.




 HadoopOutFormat OutputCommitter is default to FileOutputCommiter
 

 Key: FLINK-2394
 URL: https://issues.apache.org/jira/browse/FLINK-2394
 Project: Flink
  Issue Type: Bug
  Components: Hadoop Compatibility
Affects Versions: 0.9.0
Reporter: Stefano Bortoli
Assignee: Fabian Hueske
 Fix For: 0.10, 0.9.1


 MongoOutputFormat does not write back in collection because the 
 HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and 
 is set as default to FileOutputCommitter. Therefore, on close and 
 globalFinalize execution the commit does not happen and mongo collection 
 stays untouched. 
 A simple solution would be to:
 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat 
 that gets the OutputCommitter as a parameter
 2 - change the outputCommitter field of HadoopOutputFormatBase to be a 
 generic OutputCommitter
 3 - remove the default assignment in the open() and finalizeGlobal to the 
 outputCommitter to FileOutputCommitter(), or keep it as a default in case of 
 no specific assignment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-2478) The page “FlinkML - Machine Learning for Flink“ https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/ contains a dead link