[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37886531 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- That is true, but it might ease the problem a little bit if newly added tests try to use `collect`. And I doubt that we'll soon find somebody who will take care of this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2460] [runtime] Check parent state in i...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1051#issuecomment-134676740 Addressing the comment and merging this for 0.10 and 0.9.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand
GitHub user aljoscha opened a pull request: https://github.com/apache/flink/pull/1057 [hotfix] Allow setting FLINK_CONF_DIR by hand This makes it possible for users to set a per-job conf directory when using the one-flink-cluster-per-job mode on yarn. Which enables, for example, per job log settings. @uce This should probably also go into 0.9.1. You can merge this pull request into a Git repository by running: $ git pull https://github.com/aljoscha/flink flink-conf-dir Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1057 commit d42e7df0110adfa4702de2fc2e31c85e8ecc0c18 Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-08-25T17:26:29Z [hotfix] Allow setting FLINK_CONF_DIR by hand --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/1057#discussion_r37895597 --- Diff: flink-dist/src/main/flink-bin/bin/config.sh --- @@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib # The above lib path is used by the shell script to retrieve jars in a # directory, so it needs to be unmangled. FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR` -FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf +if [ -z $FLINK_CONF_DIR ]; then FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi --- End diff -- Yes, you don't need the else here because the variable is set either through the environment or in the if block. Still, I'd prefer newlines but it is maybe just a matter of taste here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2394] [fix] HadoopOutputFormats use cor...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/1056 [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. Right now, Flink's wrappers for Hadoop OutputFormats always use a `FileOutputCommitter`. - In the `mapreduce` API, Hadoop OutputFormats have a method `getOutputCommitter()` which can be overwritten and returns the `FileOutputFormat` by default. - In the `mapred`API, the `OutputCommitter` should be obtained from the `JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned. This PR uses the respective methods to obtain the correct `OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, the original semantics are preserved if no custom committer is implemented or set by the user. I also added convenience methods to the constructors of the `mapred` wrappers to set the `OutputCommitter` in the `JobConf`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink hadoopOutCommitter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1056 commit a632203a948f2e7973339a0eab88750f7ce70cc5 Author: Fabian Hueske fhue...@apache.org Date: 2015-07-30T19:47:01Z [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37880533 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- If I'm not mistaken, then we wanted to avoid writing data to disk because this sometimes fails on Travis. Instead we should use `collect` to keep the data in memory. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1016#issuecomment-134647030 +1 to merge, we should follow up on the Mini cluster and Curator shading separately --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1011) Sometimes Flow/Stack Layout is not presented in Dashboard's history
[ https://issues.apache.org/jira/browse/FLINK-1011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1011. Resolution: Cannot Reproduce Sometimes Flow/Stack Layout is not presented in Dashboard's history --- Key: FLINK-1011 URL: https://issues.apache.org/jira/browse/FLINK-1011 Project: Flink Issue Type: Bug Components: Webfrontend Affects Versions: pre-apache-0.5 Environment: Mac OS X and Ubuntu linux. OpenJDK 1.7. Reporter: Asterios Katsifodimos Priority: Minor The flow/stack layout in the history of completed jobs does not show up (Stratosphere Dashboard). This does not happen always. Sometimes you may get it to work. I just reproduced this one with the WordCount java example from 0.5.1 version. The job runs successfully. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1158) Logging property files missing in project created by archetypes
[ https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711610#comment-14711610 ] Robert Metzger commented on FLINK-1158: --- Looks like I fixed this with https://github.com/apache/flink/commit/354efec0f9da0fa03ea9b337b02a1a2a03a9ac16 Logging property files missing in project created by archetypes --- Key: FLINK-1158 URL: https://issues.apache.org/jira/browse/FLINK-1158 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.7.0-incubating Reporter: Till Rohrmann Fix For: 0.9 If one creates a flink project using the archetypes, then there are no predefined logging properties files. Would be very convenient for the user to have them generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1158) Logging property files missing in project created by archetypes
[ https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-1158: - Assignee: Robert Metzger Logging property files missing in project created by archetypes --- Key: FLINK-1158 URL: https://issues.apache.org/jira/browse/FLINK-1158 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.7.0-incubating Reporter: Till Rohrmann Assignee: Robert Metzger Fix For: 0.9 If one creates a flink project using the archetypes, then there are no predefined logging properties files. Would be very convenient for the user to have them generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2474) Occasional failures in PartitionedStateCheckpointingITCase
[ https://issues.apache.org/jira/browse/FLINK-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2474: --- Labels: test-stability (was: ) Occasional failures in PartitionedStateCheckpointingITCase -- Key: FLINK-2474 URL: https://issues.apache.org/jira/browse/FLINK-2474 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Márton Balassi Labels: test-stability The error message {code} Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 47.301 sec FAILURE! - in org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase runCheckpointedProgram(org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase) Time elapsed: 42.495 sec FAILURE! java.lang.AssertionError: expected:86678900 but was:3467156 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:144) at org.apache.flink.test.checkpointing.PartitionedStateCheckpointingITCase.runCheckpointedProgram(PartitionedStateCheckpointingITCase.java:117) {code} The detailed CI logs https://s3.amazonaws.com/archive.travis-ci.org/jobs/73928480/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes
[ https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711456#comment-14711456 ] ASF GitHub Bot commented on FLINK-2569: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37880533 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- If I'm not mistaken, then we wanted to avoid writing data to disk because this sometimes fails on Travis. Instead we should use `collect` to keep the data in memory. CsvReader support for ValueTypes Key: FLINK-2569 URL: https://issues.apache.org/jira/browse/FLINK-2569 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Greg Hogan Assignee: Chiwan Park Priority: Minor From the Flink Programming Guide section on Data Sources: {quote} readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic java types and their Value counterparts as field types. {quote} When specifying a ValueType, i.e. {code} CsvReader csvReader = env.readCsvFile(filename); csvReader.types(IntValue.class, IntValue.class); {code} the following error occurs as BasicTypeInfo is specifically requested in CsvReader.types(...). {code} org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977) Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a basic type. at org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177) at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393) at Driver.main(Driver.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711466#comment-14711466 ] ASF GitHub Bot commented on FLINK-2089: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134634885 LGTM +1 Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2089] [runtime] Fix illegal state in Re...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134634885 LGTM +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37881537 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.util; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.api.common.io.FileInputFormat; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.runtime.operators.testutils.MockEnvironment; +import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider; +import org.apache.flink.runtime.state.LocalStateHandle; +import org.apache.flink.streaming.api.functions.source.FileSourceFunction; +import org.apache.flink.streaming.api.functions.source.SourceFunction; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext; +import org.apache.flink.types.IntValue; +import org.junit.Assert; +import org.junit.Test; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; + +public class FileSourceFunctionTest { + @Test + public void testFileSourceFunction() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.run(ctx); + } catch (Exception e) { + e.printStackTrace(); + } + Assert.assertTrue(ctx.getData().size() == 200); + } + + @Test + public void testFileSourceFunctionCheckpoint() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.restoreState(100:1); +
[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers
[ https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711510#comment-14711510 ] ASF GitHub Bot commented on FLINK-2291: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1016#issuecomment-134646755 Looks very good. Minor comments that we may address after this pull request: - The Flink Mini cluster becomes tricky, the configurations ever more intransparent. This could use a rework. - You shade curator in Hadoop, but not in Flink. Do we expect collisions with other systems that use Curator, like newer versions of the Kafka consumers? (IIRC 0.8.3 starts using Curator). Use ZooKeeper to elect JobManager leader and send information to TaskManagers - Key: FLINK-2291 URL: https://issues.apache.org/jira/browse/FLINK-2291 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 0.10 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will act as the responsible {{JobManager}} for all {{TaskManager}}. The {{TaskManager}} will get the address of the leader from ZooKeeper. Related Wiki: [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2291] [runtime] Adds high availability ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1016#issuecomment-134646755 Looks very good. Minor comments that we may address after this pull request: - The Flink Mini cluster becomes tricky, the configurations ever more intransparent. This could use a rework. - You shade curator in Hadoop, but not in Flink. Do we expect collisions with other systems that use Curator, like newer versions of the Kafka consumers? (IIRC 0.8.3 starts using Curator). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-557) debian: permissions and users
[ https://issues.apache.org/jira/browse/FLINK-557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-557. --- Resolution: Won't Fix We don't provide a Debian image as download anymore. debian: permissions and users - Key: FLINK-557 URL: https://issues.apache.org/jira/browse/FLINK-557 Project: Flink Issue Type: Bug Reporter: GitHub Import Labels: github-import Fix For: pre-apache currently it seems as if all processes are run by the root user. For example calling the following command from a normal user account leads to write problems for the logfiles: ``` /usr/share/stratosphere-dist/bin/stratosphere run -j /usr/share/stratosphere-dist/examples/stratosphere-java-examples-0.5-SNAPSHOT-WordCount.jar -a 16 file:///var/log/syslog file:///home/physikerwelt/out An example how to run services as the designated user can be found at https://github.com/physikerwelt/mathoid/tree/master/debian Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/557 Created by: [physikerwelt|https://github.com/physikerwelt] Labels: Created at: Tue Mar 11 12:41:12 CET 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes
[ https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711570#comment-14711570 ] ASF GitHub Bot commented on FLINK-2569: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37886531 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- That is true, but it might ease the problem a little bit if newly added tests try to use `collect`. And I doubt that we'll soon find somebody who will take care of this. CsvReader support for ValueTypes Key: FLINK-2569 URL: https://issues.apache.org/jira/browse/FLINK-2569 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Greg Hogan Assignee: Chiwan Park Priority: Minor From the Flink Programming Guide section on Data Sources: {quote} readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic java types and their Value counterparts as field types. {quote} When specifying a ValueType, i.e. {code} CsvReader csvReader = env.readCsvFile(filename); csvReader.types(IntValue.class, IntValue.class); {code} the following error occurs as BasicTypeInfo is specifically requested in CsvReader.types(...). {code} org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977) Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a basic type. at org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177) at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393) at Driver.main(Driver.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2565) Support primitive arrays as keys
[ https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711586#comment-14711586 ] ASF GitHub Bot commented on FLINK-2565: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134663435 @fhueske Added the test you requested. Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2565) Support primitive arrays as keys
[ https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711594#comment-14711594 ] ASF GitHub Bot commented on FLINK-2565: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134665239 Thanks! Good to merge, IMO. Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711607#comment-14711607 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134667459 There is a Spark 2.11 artifact in mvn central. I think they are doing a similar thing as we are already doing with the hadoop1/hadoop2 versions: They generate specific pom files when deploying Spark to maven central: https://github.com/apache/spark/blob/master/dev/change-scala-version.sh Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2189) NullPointerException in MutableHashTable
[ https://issues.apache.org/jira/browse/FLINK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711608#comment-14711608 ] Ufuk Celebi commented on FLINK-2189: [~Felix Neutatz], can you still reproduce this error after the recent fix in 627f3cbcfdca8368eea6aa825cd9a45a9a0a841f? NullPointerException in MutableHashTable Key: FLINK-2189 URL: https://issues.apache.org/jira/browse/FLINK-2189 Project: Flink Issue Type: Bug Components: Core Reporter: Till Rohrmann [~Felix Neutatz] reported a {{NullPointerException}} in the {{MutableHashTable}} when running the {{ALS}} algorithm. The stack trace is the following: {code} Caused by: java.lang.NullPointerException at org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310) at org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1094) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:927) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:783) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} He produced this error on his local machine with the following code: {code} implicit val env = ExecutionEnvironment.getExecutionEnvironment val links = MovieLensUtils.readLinks(movieLensDir + links.csv) val movies = MovieLensUtils.readMovies(movieLensDir + movies.csv) val ratings = MovieLensUtils.readRatings(movieLensDir + ratings.csv) val tags = MovieLensUtils.readTags(movieLensDir + tags.csv) val ratingMatrix = ratings.map { r = (r.userId.toInt, r.movieId.toInt, r.rating) } val testMatrix = ratings.map { r = (r.userId.toInt, r.movieId.toInt) } val als = ALS() .setIterations(10) .setNumFactors(10) .setBlocks(150) als.fit(ratingMatrix) val result = als.predict(testMatrix) result.print val risk = als.empiricalRisk(ratingMatrix).collect().apply(0) println(Empirical risk: + risk) env.execute() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37883013 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java --- @@ -120,12 +131,24 @@ public void run(SourceContextOUT ctx) throws Exception { OUT nextElement = serializer.createInstance(); nextElement = format.nextRecord(nextElement); if (nextElement == null splitIterator.hasNext()) { - format.open(splitIterator.next()); + InputSplit split = splitIterator.next(); + splitNumber = split.getSplitNumber(); + currRecord = 0l; + format.open(split); continue; } else if (nextElement == null) { break; } - ctx.collect(nextElement); + if(splitNumber == checkpointedSplit){ --- End diff -- What if you've checkpointed the 2. split after seeing the 1. and 2. split and now the source is re-executed with the first split? Aren't records written again because you only save the latest checkpointed split number? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2429) Remove the enableCheckpointing() without interval variant
[ https://issues.apache.org/jira/browse/FLINK-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2429: --- Fix Version/s: 0.10 Remove the enableCheckpointing() without interval variant --- Key: FLINK-2429 URL: https://issues.apache.org/jira/browse/FLINK-2429 Project: Flink Issue Type: Wish Components: Streaming Reporter: Stephan Ewen Priority: Minor Fix For: 0.10 I think it is not very obvious what the default checkpointing interval is. Also, when somebody activates checkpointing, shouldn't they think about what they want in terms of frequency and recovery latency tradeoffs? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134624663 Cool, that was quick ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent
[ https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711509#comment-14711509 ] ASF GitHub Bot commented on FLINK-2314: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37883013 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/FileSourceFunction.java --- @@ -120,12 +131,24 @@ public void run(SourceContextOUT ctx) throws Exception { OUT nextElement = serializer.createInstance(); nextElement = format.nextRecord(nextElement); if (nextElement == null splitIterator.hasNext()) { - format.open(splitIterator.next()); + InputSplit split = splitIterator.next(); + splitNumber = split.getSplitNumber(); + currRecord = 0l; + format.open(split); continue; } else if (nextElement == null) { break; } - ctx.collect(nextElement); + if(splitNumber == checkpointedSplit){ --- End diff -- What if you've checkpointed the 2. split after seeing the 1. and 2. split and now the source is re-executed with the first split? Aren't records written again because you only save the latest checkpointed split number? Make Streaming File Sources Persistent -- Key: FLINK-2314 URL: https://issues.apache.org/jira/browse/FLINK-2314 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Sheetal Parade Labels: easyfix, starter Streaming File sources should participate in the checkpointing. They should track the bytes they read from the file and checkpoint it. One can look at the sequence generating source function for an example of a checkpointed source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711563#comment-14711563 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134658771 Is the Maven shade plugin bug the reason why this fails: ``` ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade (shade-hadoop) on project flink-yarn-tests_2.11: Error creating shaded jar: 3 problems were encountered while building the effective model for org.apache.flink:flink-yarn-tests_2.11:[unknown-version] [ERROR] [WARNING] 'artifactId' contains an expression but should be a constant. @ org.apache.flink:flink-yarn-tests${scala.suffix}:[unknown-version], /home/robert/incubator-flink/flink-yarn-tests/pom.xml, line 36, column 14 [ERROR] [WARNING] 'parent.relativePath' of POM org.apache.flink:flink-yarn-tests_2.11:[unknown-version] (/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml) points at org.apache.flink:flink-yarn-tests${scala.suffix} instead of org.apache.flink:flink-parent${scala.suffix}, please verify your project structure @ line 3, column 11 [ERROR] [FATAL] Non-resolvable parent POM for org.apache.flink:flink-yarn-tests_2.11:[unknown-version]: Could not find artifact org.apache.flink:flink-parent${scala.suffix}:pom:0.10-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) and 'parent.relativePath' points at wrong local POM @ line 3, column 11 [ERROR] for project org.apache.flink:flink-yarn-tests_2.11:[unknown-version] at /home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml ``` ? About the shading artifacts, your guess is right. Because Hadoop packages don't need Scala dependencies, I didn't add suffix to them. But if we need the suffix for them to maintain uniformity, we can add the suffix. How do you think? I think its fine to leave them as they are. As you see, there are property expressions (${scala.suffix}) in artifactId. I think that it can be a problem. How can I solve this? Yes, that is certainly a problem. Also, the artifact for flink-parent is not created properly in my local maven repository. Its name is now `flink-parent${scala.suffix}/`. Maybe we have to look at other projects which are doing the same... if there are any projects ;) Kafka for example is offering builds for different scala versions. Sadly, they are using sbt for building their project. Spark doesn't deploy its _2.11 artifacts to maven central. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134658771 Is the Maven shade plugin bug the reason why this fails: ``` ERROR] Failed to execute goal org.apache.maven.plugins:maven-shade-plugin:2.4.1:shade (shade-hadoop) on project flink-yarn-tests_2.11: Error creating shaded jar: 3 problems were encountered while building the effective model for org.apache.flink:flink-yarn-tests_2.11:[unknown-version] [ERROR] [WARNING] 'artifactId' contains an expression but should be a constant. @ org.apache.flink:flink-yarn-tests${scala.suffix}:[unknown-version], /home/robert/incubator-flink/flink-yarn-tests/pom.xml, line 36, column 14 [ERROR] [WARNING] 'parent.relativePath' of POM org.apache.flink:flink-yarn-tests_2.11:[unknown-version] (/home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml) points at org.apache.flink:flink-yarn-tests${scala.suffix} instead of org.apache.flink:flink-parent${scala.suffix}, please verify your project structure @ line 3, column 11 [ERROR] [FATAL] Non-resolvable parent POM for org.apache.flink:flink-yarn-tests_2.11:[unknown-version]: Could not find artifact org.apache.flink:flink-parent${scala.suffix}:pom:0.10-SNAPSHOT in apache.snapshots (http://repository.apache.org/snapshots) and 'parent.relativePath' points at wrong local POM @ line 3, column 11 [ERROR] for project org.apache.flink:flink-yarn-tests_2.11:[unknown-version] at /home/robert/incubator-flink/flink-yarn-tests/target/dependency-reduced-pom.xml ``` ? About the shading artifacts, your guess is right. Because Hadoop packages don't need Scala dependencies, I didn't add suffix to them. But if we need the suffix for them to maintain uniformity, we can add the suffix. How do you think? I think its fine to leave them as they are. As you see, there are property expressions (${scala.suffix}) in artifactId. I think that it can be a problem. How can I solve this? Yes, that is certainly a problem. Also, the artifact for flink-parent is not created properly in my local maven repository. Its name is now `flink-parent${scala.suffix}/`. Maybe we have to look at other projects which are doing the same... if there are any projects ;) Kafka for example is offering builds for different scala versions. Sadly, they are using sbt for building their project. Spark doesn't deploy its _2.11 artifacts to maven central. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2089] [runtime] Fix illegal state in Re...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134676836 I will address the comment and merge this for 0.10 and 0.9.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2089) Buffer recycled IllegalStateException during cancelling
[ https://issues.apache.org/jira/browse/FLINK-2089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711654#comment-14711654 ] ASF GitHub Bot commented on FLINK-2089: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1050#issuecomment-134676836 I will address the comment and merge this for 0.10 and 0.9.1. Buffer recycled IllegalStateException during cancelling - Key: FLINK-2089 URL: https://issues.apache.org/jira/browse/FLINK-2089 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: master Reporter: Ufuk Celebi Assignee: Ufuk Celebi Fix For: 0.10, 0.9.1 [~rmetzger] reported the following stack trace during cancelling of high parallelism jobs: {code} Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} This looks like a concurrent buffer pool release/buffer usage error. I'm investing this today. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2460) ReduceOnNeighborsWithExceptionITCase failure
[ https://issues.apache.org/jira/browse/FLINK-2460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711653#comment-14711653 ] ASF GitHub Bot commented on FLINK-2460: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/1051#issuecomment-134676740 Addressing the comment and merging this for 0.10 and 0.9.1. ReduceOnNeighborsWithExceptionITCase failure Key: FLINK-2460 URL: https://issues.apache.org/jira/browse/FLINK-2460 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Ufuk Celebi I noticed a build error due to failure on this case. It was on a branch of my fork, which didn't actually have anything to do with the failed test or the runtime system at all. Here's the error log: https://s3.amazonaws.com/archive.travis-ci.org/jobs/73695554/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/1053#issuecomment-134584979 @fhueske Thanks for review. :) I addressed your comments. * Add `getBasicAndBasicValueTupleTypeInfo` method into `TupleTypeInfo` * Add `isBasicValueType` method into `ValueTypeInfo` class to check whether the type is basic value or not --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134639235 Looks good +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/1057#discussion_r37894302 --- Diff: flink-dist/src/main/flink-bin/bin/config.sh --- @@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib # The above lib path is used by the shell script to retrieve jars in a # directory, so it needs to be unmangled. FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR` -FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf +if [ -z $FLINK_CONF_DIR ]; then FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi --- End diff -- Maybe just code style but could you make this more explicit using if-else blocks? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1057#issuecomment-134679865 Very useful feature. In addition, I could also imagine that the config file could be passed as a parameter to the ./bin/flink utility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix] Allow setting FLINK_CONF_DIR by hand
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/1057#discussion_r37894520 --- Diff: flink-dist/src/main/flink-bin/bin/config.sh --- @@ -127,7 +127,7 @@ FLINK_LIB_DIR=$FLINK_ROOT_DIR/lib # The above lib path is used by the shell script to retrieve jars in a # directory, so it needs to be unmangled. FLINK_ROOT_DIR_MANGLED=`manglePath $FLINK_ROOT_DIR` -FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf +if [ -z $FLINK_CONF_DIR ]; then FLINK_CONF_DIR=$FLINK_ROOT_DIR_MANGLED/conf; fi --- End diff -- What would be the else block? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [scripts] resolve base path of symlinked execu...
Github user uce commented on the pull request: https://github.com/apache/flink/pull/1049#issuecomment-134592174 Corresponding issue in order to track fixed issues for the upcoming release: https://issues.apache.org/jira/browse/FLINK-2572 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134618042 @StephanEwen I've reimplemented hashCode() and compare() accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2565) Support primitive arrays as keys
[ https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711416#comment-14711416 ] ASF GitHub Bot commented on FLINK-2565: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134618042 @StephanEwen I've reimplemented hashCode() and compare() accordingly. Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2538) Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()
[ https://issues.apache.org/jira/browse/FLINK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2538: --- Affects Version/s: master Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo() --- Key: FLINK-2538 URL: https://issues.apache.org/jira/browse/FLINK-2538 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ted Yu Priority: Minor Fix For: 0.10 In ClassLoaderUtil#getUserCodeClassLoaderInfo() around line 76: {code} else { try { new JarFile(filePath); bld.append( (valid JAR)); } catch (Exception e) { bld.append( (invalid JAR: ).append(e.getMessage()).append(')'); } } {code} The JarFile isn't closed before returning, leading to potential resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes
[ https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711427#comment-14711427 ] ASF GitHub Bot commented on FLINK-2569: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/1053#issuecomment-134621653 Looks good to merge :+1: CsvReader support for ValueTypes Key: FLINK-2569 URL: https://issues.apache.org/jira/browse/FLINK-2569 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Greg Hogan Assignee: Chiwan Park Priority: Minor From the Flink Programming Guide section on Data Sources: {quote} readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic java types and their Value counterparts as field types. {quote} When specifying a ValueType, i.e. {code} CsvReader csvReader = env.readCsvFile(filename); csvReader.types(IntValue.class, IntValue.class); {code} the following error occurs as BasicTypeInfo is specifically requested in CsvReader.types(...). {code} org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977) Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a basic type. at org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177) at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393) at Driver.main(Driver.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method
[ https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711345#comment-14711345 ] ASF GitHub Bot commented on FLINK-2570: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37869769 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.ReduceFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.ReduceNeighborsFunction; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Triplet; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; + +import java.util.TreeMap; + +/** + * Triangle Count Algorithm. + * + * This algorithm operates in three phases. First, vertices select neighbors with id greater than theirs + * and send messages to them. Each received message is then propagated to neighbors with higher id. + * Finally, if a node encounters the target id in the list of received messages, it increments the number + * of triangles found. + * + * This implementation is non - iterative. The total number of triangles can be determined by performing + * a single pass through the graph. + */ +public class GSATriangleCount implements + GraphAlgorithmLong, NullValue, NullValue, DataSetTuple1Integer { + + @Override + public DataSetTuple1Integer run(GraphLong, NullValue, NullValue input) throws Exception { + + ExecutionEnvironment env = input.getContext(); + + // order the edges so that src is always higher than trg + DataSetEdgeLong, NullValue edges = input.getEdges() + .map(new OrderEdges()).distinct(); --- End diff -- this call to `distinct()` here means that basically if you have 2 edges a-b and b-a in the input, and they are both part of a triangle, then you only count it once? Add a Triangle Count Library Method --- Key: FLINK-2570 URL: https://issues.apache.org/jira/browse/FLINK-2570 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.10 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected graph as input and outputs the total number of triangles formed by the graph's edges. The implementation consists of three phases: 1). Select neighbours with id greater than the current vertex id. Gather: no-op Sum: create a set out of these neighbours Apply: attach the computed values to the vertices 2). Propagate each received value to neighbours with higher id (again using GSA) 3). Compute the number of Triangles by verifying if the final vertex contains the sender's id in its list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2543) State handling does not support deserializing classes through the UserCodeClassloader
[ https://issues.apache.org/jira/browse/FLINK-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711343#comment-14711343 ] ASF GitHub Bot commented on FLINK-2543: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1048#issuecomment-134600109 I made the SerializedThrowable an Exception and removed the `JobFailure` message again. State handling does not support deserializing classes through the UserCodeClassloader - Key: FLINK-2543 URL: https://issues.apache.org/jira/browse/FLINK-2543 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9, 0.10 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Blocker Fix For: 0.10, 0.9.1 The current implementation of the state checkpointing does not support custom classes, because the UserCodeClassLoader is not used to deserialize the state. {code} Error: java.lang.RuntimeException: Failed to deserialize state handle and setup initial operator state. at org.apache.flink.runtime.taskmanager.Task.run(Task.java:544) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: com.ottogroup.bi.searchlab.searchsessionizer.OperatorState at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.io.ObjectInputStream.resolveClass(ObjectInputStream.java:626) at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613) at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:63) at org.apache.flink.runtime.state.ByteStreamStateHandle.getState(ByteStreamStateHandle.java:33) at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.restoreInitialState(AbstractUdfStreamOperator.java:83) at org.apache.flink.streaming.runtime.tasks.StreamTask.setInitialState(StreamTask.java:276) at org.apache.flink.runtime.state.StateUtils.setOperatorState(StateUtils.java:51) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:541) {code} The issue has been reported by a user: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Class-for-state-checkpointing-td2415.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2565) Support primitive arrays as keys
[ https://issues.apache.org/jira/browse/FLINK-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711354#comment-14711354 ] ASF GitHub Bot commented on FLINK-2565: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134603166 I think you currently pay quite a performance price for implementing the compare methods in such a ways that they delegate to the `ByteComparator` and `FloatComparator`. In order to call those, every single byte or float need to be boxed. That can be avoided if the method directly implements the array comparisons on the primitive types. Support primitive arrays as keys Key: FLINK-2565 URL: https://issues.apache.org/jira/browse/FLINK-2565 Project: Flink Issue Type: Improvement Components: Java API Reporter: Chesnay Schepler Assignee: Chesnay Schepler -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37867925 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/utils/TriangleCountData.java --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example.utils; + +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.types.NullValue; + +import java.util.ArrayList; +import java.util.List; + +/** + * Provides the default data sets used for the Triangle Count example. --- End diff -- There is no example :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711388#comment-14711388 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/885#discussion_r37874193 --- Diff: flink-staging/flink-gelly/pom.xml --- @@ -37,17 +37,17 @@ under the License. dependencies dependency groupIdorg.apache.flink/groupId - artifactIdflink-java/artifactId + artifactIdflink-java${scala.suffix}/artifactId version${project.version}/version /dependency dependency groupIdorg.apache.flink/groupId - artifactIdflink-clients/artifactId + artifactIdflink-clients${scala.suffix}/artifactId version${project.version}/version /dependency dependency groupIdorg.apache.flink/groupId - artifactIdflink-test-utils/artifactId + artifactIdflink-test-utils${scala.suffix}/artifactId version${project.version}/version scopetest/scope /dependency --- End diff -- I can not comment below this line, but you forgot the `${scala.suffix}` for the `flink-optimizer`. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/885#discussion_r37874193 --- Diff: flink-staging/flink-gelly/pom.xml --- @@ -37,17 +37,17 @@ under the License. dependencies dependency groupIdorg.apache.flink/groupId - artifactIdflink-java/artifactId + artifactIdflink-java${scala.suffix}/artifactId version${project.version}/version /dependency dependency groupIdorg.apache.flink/groupId - artifactIdflink-clients/artifactId + artifactIdflink-clients${scala.suffix}/artifactId version${project.version}/version /dependency dependency groupIdorg.apache.flink/groupId - artifactIdflink-test-utils/artifactId + artifactIdflink-test-utils${scala.suffix}/artifactId version${project.version}/version scopetest/scope /dependency --- End diff -- I can not comment below this line, but you forgot the `${scala.suffix}` for the `flink-optimizer`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2538) Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo()
[ https://issues.apache.org/jira/browse/FLINK-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2538: --- Fix Version/s: 0.10 Potential resource leak in ClassLoaderUtil#getUserCodeClassLoaderInfo() --- Key: FLINK-2538 URL: https://issues.apache.org/jira/browse/FLINK-2538 Project: Flink Issue Type: Bug Affects Versions: master Reporter: Ted Yu Priority: Minor Fix For: 0.10 In ClassLoaderUtil#getUserCodeClassLoaderInfo() around line 76: {code} else { try { new JarFile(filePath); bld.append( (valid JAR)); } catch (Exception e) { bld.append( (invalid JAR: ).append(e.getMessage()).append(')'); } } {code} The JarFile isn't closed before returning, leading to potential resource leak. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException
[ https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1989: --- Fix Version/s: (was: 0.9) 0.10 Sorting of POJO data set from TableEnv yields NotSerializableException -- Key: FLINK-1989 URL: https://issues.apache.org/jira/browse/FLINK-1989 Project: Flink Issue Type: Bug Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Aljoscha Krettek Fix For: 0.10 Sorting or grouping (or probably any other key operation) on a POJO data set that was created by a {{TableEnvironment}} yields a {{NotSerializableException}} due to a non-serializable {{java.lang.reflect.Field}} object. I traced the error back to the {{ExpressionSelectFunction}}. I guess that a {{TypeInformation}} object is stored in the generated user-code function. A {{PojoTypeInfo}} holds Field objects, which cannot be serialized. The following test can be pasted into the {{SelectITCase}} and reproduces the problem. {code} @Test public void testGroupByAfterTable() throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); TableEnvironment tableEnv = new TableEnvironment(); DataSetTuple3Integer, Long, String ds = CollectionDataSets.get3TupleDataSet(env); Table in = tableEnv.toTable(ds, a,b,c); Table result = in .select(a, b, c); DataSetABC resultSet = tableEnv.toSet(result, ABC.class); resultSet .sortPartition(a, Order.DESCENDING) .writeAsText(resultPath, FileSystem.WriteMode.OVERWRITE); env.execute(); expected = 1,1,Hi\n + 2,2,Hello\n + 3,2,Hello world\n + 4,3,Hello world, + how are you?\n + 5,3,I am fine.\n + 6,3,Luke Skywalker\n + 7,4, + Comment#1\n + 8,4,Comment#2\n + 9,4,Comment#3\n + 10,4,Comment#4\n + 11,5, + Comment#5\n + 12,5,Comment#6\n + 13,5,Comment#7\n + 14,5,Comment#8\n + 15,5, + Comment#9\n + 16,6,Comment#10\n + 17,6,Comment#11\n + 18,6,Comment#12\n + 19, + 6,Comment#13\n + 20,6,Comment#14\n + 21,6,Comment#15\n; } public static class ABC { public int a; public long b; public String c; } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2386] Add new Kafka Consumer for Flink ...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1055#issuecomment-134599517 The tests in this pull request might fail because the fixes to the BufferBarrier are not backported to 0.9 yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...
Github user andralungu commented on the pull request: https://github.com/apache/flink/pull/1054#issuecomment-134590137 You're right @tillrohrmann. I updated the JIRA to also contain a small description :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2555) Hadoop Input/Output Formats are unable to access secured HDFS clusters
[ https://issues.apache.org/jira/browse/FLINK-2555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711458#comment-14711458 ] ASF GitHub Bot commented on FLINK-2555: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1038#issuecomment-134632899 I've opened another issue for that: https://issues.apache.org/jira/browse/FLINK-2573 Hadoop Input/Output Formats are unable to access secured HDFS clusters -- Key: FLINK-2555 URL: https://issues.apache.org/jira/browse/FLINK-2555 Project: Flink Issue Type: Bug Affects Versions: 0.9, 0.10 Reporter: Robert Metzger Assignee: Robert Metzger Priority: Critical It seems that authentication tokens are not passed correctly to the input format when accessing secured HDFS clusters. Exception {code} org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Failed to submit job b319a28f62855917901cfb67c5457142 (Flink Java Job at Thu Aug 20 10:46:41 PDT 2015) at org.apache.flink.client.program.Client.run(Client.java:413) at org.apache.flink.client.program.Client.run(Client.java:356) at org.apache.flink.client.program.Client.run(Client.java:349) at org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:63) at org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:789) at org.apache.flink.api.java.DataSet.collect(DataSet.java:408) at org.apache.flink.api.java.DataSet.print(DataSet.java:1346) at de.robertmetzger.WordCount.main(WordCount.java:73) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:584) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:290) at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:873) at org.apache.flink.client.CliFrontend$2.run(CliFrontend.java:870) at org.apache.flink.runtime.security.SecurityUtils$1.run(SecurityUtils.java:50) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556) at org.apache.flink.runtime.security.SecurityUtils.runSecured(SecurityUtils.java:47) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:870) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:922) Caused by: org.apache.flink.runtime.client.JobExecutionException: Failed to submit job b319a28f62855917901cfb67c5457142 (Flink Java Job at Thu Aug 20 10:46:41 PDT 2015) at org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$submitJob(JobManager.scala:594) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:190) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.yarn.ApplicationMasterActor$$anonfun$receiveYarnMessages$1.applyOrElse(ApplicationMasterActor.scala:100) at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:92) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at
[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method
[ https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711360#comment-14711360 ] ASF GitHub Bot commented on FLINK-2570: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1054#issuecomment-134604410 Thanks for adding this one @andralungu. We really need this algorithm in the library! I have a few comments: - we should clarify what is the expected input graph format and the output. If I'm not mistaken, it seems you're expecting a directed graph without edge duplicates and you count triangles ignoring edge direction. Is that correct? I would add a clear comment in the usage description about that. If we want to do this even better, we could even add a graph validator for the input. - I'm not quite sure what happens when the graph has opposite direction edges, i.e. a-b and b-a, that are both part of a triangle. I would expect that this triangle would be counted twice, but it seems to me that you're only counting it once. Is there a reason for that? - as you've been experimenting with this for a while, could you let us know how better is this than your vertex-centric version? Is it always the case? If not, do you think it would make sense to add both implementations in the library and let the users choose? Add a Triangle Count Library Method --- Key: FLINK-2570 URL: https://issues.apache.org/jira/browse/FLINK-2570 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.10 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected graph as input and outputs the total number of triangles formed by the graph's edges. The implementation consists of three phases: 1). Select neighbours with id greater than the current vertex id. Gather: no-op Sum: create a set out of these neighbours Apply: attach the computed values to the vertices 2). Propagate each received value to neighbours with higher id (again using GSA) 3). Compute the number of Triangles by verifying if the final vertex contains the sender's id in its list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/1054#issuecomment-134604410 Thanks for adding this one @andralungu. We really need this algorithm in the library! I have a few comments: - we should clarify what is the expected input graph format and the output. If I'm not mistaken, it seems you're expecting a directed graph without edge duplicates and you count triangles ignoring edge direction. Is that correct? I would add a clear comment in the usage description about that. If we want to do this even better, we could even add a graph validator for the input. - I'm not quite sure what happens when the graph has opposite direction edges, i.e. a-b and b-a, that are both part of a triangle. I would expect that this triangle would be counted twice, but it seems to me that you're only counting it once. Is there a reason for that? - as you've been experimenting with this for a while, could you let us know how better is this than your vertex-centric version? Is it always the case? If not, do you think it would make sense to add both implementations in the library and let the users choose? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-987) Extend TypeSerializers and -Comparators to work directly on Memory Segments
[ https://issues.apache.org/jira/browse/FLINK-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-987: -- Fix Version/s: (was: 0.9) 0.10 Extend TypeSerializers and -Comparators to work directly on Memory Segments --- Key: FLINK-987 URL: https://issues.apache.org/jira/browse/FLINK-987 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.6-incubating Reporter: Stephan Ewen Assignee: Aljoscha Krettek Fix For: 0.10 As per discussion with [~till.rohrmann], [~uce], [~aljoscha], we suggest to change the way that the TypeSerialzers/Comparators and DataInputViews/DataOutputViews work. The goal is to allow more flexibility in the construction on the binary representation of data types, and to allow partial deserialization of individual fields. Both is currently prohibited by the fact that the abstraction of the memory (into which the data goes) is a stream abstraction ({{DataInputView}}, {{DataOutputView}}). An idea is to offer a random-access buffer like view for construction and random-access deserialization, as well as various methods to copy elements in a binary fashion between such buffers and streams. A possible set of methods for the {{TypeSerializer}} could be: {code} long serialize(T record, TargetBuffer buffer); T deserialize(T reuse, SourceBuffer source); void ensureBufferSufficientlyFilled(SourceBuffer source); X X deserializeField(X reuse, int logicalPos, SourceBuffer buffer); int getOffsetForField(int logicalPos, int offset, SourceBuffer buffer); void copy(DataInputView in, TargetBuffer buffer); void copy(SourceBuffer buffer,, DataOutputView out); void copy(DataInputView source, DataOutputView target); {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1610) Java docs do not build
[ https://issues.apache.org/jira/browse/FLINK-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1610: --- Fix Version/s: (was: 0.9) 0.10 Java docs do not build -- Key: FLINK-1610 URL: https://issues.apache.org/jira/browse/FLINK-1610 Project: Flink Issue Type: Bug Components: Build System, Documentation Affects Versions: 0.9 Reporter: Maximilian Michels Fix For: 0.10 Among a bunch of warnings, I get the following error which prevents the java doc generation from finishing: {code} javadoc: error - com.sun.tools.doclets.internal.toolkit.util.DocletAbortException: com.sun.tools.javac.code.Symbol$CompletionFailure: class file for akka.testkit.TestKit not found Command line was: /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/bin/javadoc -Xdoclint:none @options @packages at org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeJavadocCommandLine(AbstractJavadocMojo.java:5074) at org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeReport(AbstractJavadocMojo.java:1999) at org.apache.maven.plugin.javadoc.JavadocReport.generate(JavadocReport.java:130) at org.apache.maven.plugin.javadoc.JavadocReport.execute(JavadocReport.java:315) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80) at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216) at org.apache.maven.cli.MavenCli.main(MavenCli.java:160) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1778) Improve normalized keys in composite key case
[ https://issues.apache.org/jira/browse/FLINK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711362#comment-14711362 ] Stephan Ewen commented on FLINK-1778: - Has not been addressed Improve normalized keys in composite key case - Key: FLINK-1778 URL: https://issues.apache.org/jira/browse/FLINK-1778 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.9 Currently, if we have a key (String, long), the String will take up the entire normalized key space, without being fully discerning anyways. Limiting the key prefix in size and giving space to the second key field should in most cases improve the comparison efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1778) Improve normalized keys in composite key case
[ https://issues.apache.org/jira/browse/FLINK-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1778: --- Fix Version/s: (was: 0.9) 0.10 Improve normalized keys in composite key case - Key: FLINK-1778 URL: https://issues.apache.org/jira/browse/FLINK-1778 Project: Flink Issue Type: Improvement Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10 Currently, if we have a key (String, long), the String will take up the entire normalized key space, without being fully discerning anyways. Limiting the key prefix in size and giving space to the second key field should in most cases improve the comparison efficiency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2490][FIX]Remove the retryForever check...
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/992#discussion_r37879401 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java --- @@ -42,11 +42,13 @@ private boolean retryForever; private Socket socket; private static final int CONNECTION_TIMEOUT_TIME = 0; - private static final int CONNECTION_RETRY_SLEEP = 1000; + public static int CONNECTION_RETRY_SLEEP = 1000; --- End diff -- This shouldn't be modifiable by everyone. Please make it just package-visible by removing the `public` modifier. Also, please keep the `final` modifier because the current implementation just lets the number of retries be configurable with a fixed 1 second retry rate. This is also documented in the user-facing API methods on DataStream. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1297) Add support for tracking statistics of intermediate results
[ https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1297: --- Fix Version/s: (was: 0.9) 0.10 Add support for tracking statistics of intermediate results --- Key: FLINK-1297 URL: https://issues.apache.org/jira/browse/FLINK-1297 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Fix For: 0.10 Original Estimate: 1,008h Remaining Estimate: 1,008h One of the major problems related to the optimizer at the moment is the lack of proper statistics. With the introduction of staged execution, it is possible to instrument the runtime code with a statistics facility that collects the required information for optimizing the next execution stage. I would therefore like to contribute code that can be used to gather basic statistics for the (intermediate) result of dataflows (e.g. min, max, count, count distinct) and make them available to the job manager. Before I start, I would like to hear some feedback form the other users. In particular, to handle skew (e.g. on grouping) it might be good to have some sort of detailed sketch about the key distribution of an intermediate result. I am not sure whether a simple histogram is the most effective way to go. Maybe somebody would propose another lightweight sketch that provides better accuracy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2528) DriverTestBase tests fail spuriously
[ https://issues.apache.org/jira/browse/FLINK-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2528. --- DriverTestBase tests fail spuriously Key: FLINK-2528 URL: https://issues.apache.org/jira/browse/FLINK-2528 Project: Flink Issue Type: Bug Reporter: Sachin Goel Assignee: Stephan Ewen Fix For: 0.10 MatchTaskTest fails with a Null Pointer exception. Here's the log: https://s3.amazonaws.com/archive.travis-ci.org/jobs/75780253/log.txt and the relevant parts of trace: {code} Exception in thread Thread-154 java.lang.AssertionError: Canceling task failed: java.lang.NullPointerException at org.apache.flink.runtime.operators.testutils.DriverTestBase.cancel(DriverTestBase.java:271) at org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:60) at org.junit.Assert.fail(Assert.java:88) at org.apache.flink.runtime.operators.testutils.TaskCancelThread.run(TaskCancelThread.java:68) {code} {code} Thread-153 prio=10 tid=0x7fc1e1338800 nid=0x5cd6 waiting on condition [0x7fc1d2b1] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.apache.flink.runtime.operators.MatchTaskTest$MockDelayingMatchStub.join(MatchTaskTest.java:984) at org.apache.flink.runtime.operators.MatchTaskTest$MockDelayingMatchStub.join(MatchTaskTest.java:978) at org.apache.flink.runtime.operators.sort.AbstractMergeIterator.crossMwithNValues(AbstractMergeIterator.java:297) at org.apache.flink.runtime.operators.sort.AbstractMergeIterator.crossMatchingGroup(AbstractMergeIterator.java:146) at org.apache.flink.runtime.operators.sort.AbstractMergeInnerJoinIterator.callWithNextKey(AbstractMergeInnerJoinIterator.java:105) at org.apache.flink.runtime.operators.JoinDriver.run(JoinDriver.java:208) at org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriverInternal(DriverTestBase.java:208) at org.apache.flink.runtime.operators.testutils.DriverTestBase.testDriver(DriverTestBase.java:174) at org.apache.flink.runtime.operators.MatchTaskTest$3.run(MatchTaskTest.java:520) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method
[ https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711338#comment-14711338 ] ASF GitHub Bot commented on FLINK-2570: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37869349 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.ReduceFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.ReduceNeighborsFunction; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Triplet; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; + +import java.util.TreeMap; + +/** + * Triangle Count Algorithm. + * + * This algorithm operates in three phases. First, vertices select neighbors with id greater than theirs + * and send messages to them. Each received message is then propagated to neighbors with higher id. + * Finally, if a node encounters the target id in the list of received messages, it increments the number + * of triangles found. + * + * This implementation is non - iterative. The total number of triangles can be determined by performing + * a single pass through the graph. --- End diff -- The implementation is not iterative indeed, but I wouldn't call it single-pass. Single-pass means that each edge/vertex of the graph is only read and processed once. That would be the case, e.g. in a streaming implementation. I would simply remove the 2nd sentence here :) Add a Triangle Count Library Method --- Key: FLINK-2570 URL: https://issues.apache.org/jira/browse/FLINK-2570 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.10 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected graph as input and outputs the total number of triangles formed by the graph's edges. The implementation consists of three phases: 1). Select neighbours with id greater than the current vertex id. Gather: no-op Sum: create a set out of these neighbours Apply: attach the computed values to the vertices 2). Propagate each received value to neighbours with higher id (again using GSA) 3). Compute the number of Triangles by verifying if the final vertex contains the sender's id in its list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37869349 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.ReduceFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.ReduceNeighborsFunction; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Triplet; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; + +import java.util.TreeMap; + +/** + * Triangle Count Algorithm. + * + * This algorithm operates in three phases. First, vertices select neighbors with id greater than theirs + * and send messages to them. Each received message is then propagated to neighbors with higher id. + * Finally, if a node encounters the target id in the list of received messages, it increments the number + * of triangles found. + * + * This implementation is non - iterative. The total number of triangles can be determined by performing + * a single pass through the graph. --- End diff -- The implementation is not iterative indeed, but I wouldn't call it single-pass. Single-pass means that each edge/vertex of the graph is only read and processed once. That would be the case, e.g. in a streaming implementation. I would simply remove the 2nd sentence here :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134603166 I think you currently pay quite a performance price for implementing the compare methods in such a ways that they delegate to the `ByteComparator` and `FloatComparator`. In order to call those, every single byte or float need to be boxed. That can be avoided if the method directly implements the array comparisons on the primitive types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2570] [gelly] Added a Triangle Count Li...
Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37869769 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/library/GSATriangleCount.java --- @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.library; + + +import org.apache.flink.api.common.functions.MapFunction; +import org.apache.flink.api.common.functions.ReduceFunction; +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.api.java.tuple.Tuple1; +import org.apache.flink.api.java.tuple.Tuple2; +import org.apache.flink.graph.GraphAlgorithm; +import org.apache.flink.graph.ReduceNeighborsFunction; +import org.apache.flink.graph.Vertex; +import org.apache.flink.graph.Edge; +import org.apache.flink.graph.Triplet; +import org.apache.flink.graph.EdgeDirection; +import org.apache.flink.graph.Graph; +import org.apache.flink.types.NullValue; + +import java.util.TreeMap; + +/** + * Triangle Count Algorithm. + * + * This algorithm operates in three phases. First, vertices select neighbors with id greater than theirs + * and send messages to them. Each received message is then propagated to neighbors with higher id. + * Finally, if a node encounters the target id in the list of received messages, it increments the number + * of triangles found. + * + * This implementation is non - iterative. The total number of triangles can be determined by performing + * a single pass through the graph. + */ +public class GSATriangleCount implements + GraphAlgorithmLong, NullValue, NullValue, DataSetTuple1Integer { + + @Override + public DataSetTuple1Integer run(GraphLong, NullValue, NullValue input) throws Exception { + + ExecutionEnvironment env = input.getContext(); + + // order the edges so that src is always higher than trg + DataSetEdgeLong, NullValue edges = input.getEdges() + .map(new OrderEdges()).distinct(); --- End diff -- this call to `distinct()` here means that basically if you have 2 edges a-b and b-a in the input, and they are both part of a triangle, then you only count it once? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Reopened] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs
[ https://issues.apache.org/jira/browse/FLINK-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi reopened FLINK-2427: I've reopened this issue to track back porting this for 0.9.1. Allow the BarrierBuffer to maintain multiple queues of blocked inputs - Key: FLINK-2427 URL: https://issues.apache.org/jira/browse/FLINK-2427 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10, 0.9.1 In corner cases (dropped barriers due to failures/startup races), this is required for proper operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2427) Allow the BarrierBuffer to maintain multiple queues of blocked inputs
[ https://issues.apache.org/jira/browse/FLINK-2427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2427: --- Fix Version/s: 0.9.1 Allow the BarrierBuffer to maintain multiple queues of blocked inputs - Key: FLINK-2427 URL: https://issues.apache.org/jira/browse/FLINK-2427 Project: Flink Issue Type: Sub-task Components: Streaming Affects Versions: 0.10 Reporter: Stephan Ewen Assignee: Stephan Ewen Fix For: 0.10, 0.9.1 In corner cases (dropped barriers due to failures/startup races), this is required for proper operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1129) The Plan Visualizer Cuts of the Lower Part of Certain Operators
[ https://issues.apache.org/jira/browse/FLINK-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711309#comment-14711309 ] Ufuk Celebi commented on FLINK-1129: This has been fixed with the new runtime interface for 0.10. I don't think that it will be back ported to 0.9.1. If someone has time to do this, please change the fix version accordingly and port it back. The Plan Visualizer Cuts of the Lower Part of Certain Operators --- Key: FLINK-1129 URL: https://issues.apache.org/jira/browse/FLINK-1129 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 0.10 Attachments: screenshot-1.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2386) Implement Kafka connector using the new Kafka Consumer API
[ https://issues.apache.org/jira/browse/FLINK-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711340#comment-14711340 ] ASF GitHub Bot commented on FLINK-2386: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/1055#issuecomment-134599517 The tests in this pull request might fail because the fixes to the BufferBarrier are not backported to 0.9 yet. Implement Kafka connector using the new Kafka Consumer API -- Key: FLINK-2386 URL: https://issues.apache.org/jira/browse/FLINK-2386 Project: Flink Issue Type: Improvement Components: Kafka Connector Reporter: Robert Metzger Assignee: Robert Metzger Fix For: 0.10, 0.9.1 Once Kafka has released its new consumer API, we should provide a connector for that version. The release will probably be called 0.9 or 0.8.3. The connector will be mostly compatible with Kafka 0.8.2.x, except for committing offsets to the broker (the new connector expects a coordinator to be available on Kafka). To work around that, we can provide a configuration option to commit offsets to zookeeper (managed by flink code). For 0.9/0.8.3 it will be fully compatible. It will not be compatible with 0.8.1 because of mismatching Kafka messages. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2542) It should be documented that it is required from a join key to override hashCode(), when it is not a POJO
[ https://issues.apache.org/jira/browse/FLINK-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2542. Resolution: Won't Fix I think it's OK to assume that people follow the general Object contract: {code} Note that it is generally necessary to override the {@code hashCode} method whenever this method is overridden, so as to maintain the general contract for the {@code hashCode} method, which states that equal objects must have equal hash codes. {code} If more people run into this, we can revisit this issue. It should be documented that it is required from a join key to override hashCode(), when it is not a POJO - Key: FLINK-2542 URL: https://issues.apache.org/jira/browse/FLINK-2542 Project: Flink Issue Type: Bug Components: Gelly, Java API Reporter: Gabor Gevay Priority: Minor Fix For: 0.10, 0.9.1 If the join key is not a POJO, and does not override hashCode, then the join silently fails (produces empty output). I don't see this documented anywhere. The Gelly documentation should also have this info separately, because it does joins internally on the vertex IDs, but the user might not know this, or might not look at the join documentation when using Gelly. Here is an example code: {noformat} public static class ID implements ComparableID { public long foo; //no default ctor -- not a POJO public ID(long foo) { this.foo = foo; } @Override public int compareTo(ID o) { return ((Long)foo).compareTo(o.foo); } @Override public boolean equals(Object o0) { if(o0 instanceof ID) { ID o = (ID)o0; return foo == o.foo; } else { return false; } } @Override public int hashCode() { return 42; } } public static void main(String[] args) throws Exception { ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSetTuple2ID, Long inDegrees = env.fromElements(Tuple2.of(new ID(123l), 4l)); DataSetTuple2ID, Long outDegrees = env.fromElements(Tuple2.of(new ID(123l), 5l)); DataSetTuple3ID, Long, Long degrees = inDegrees.join(outDegrees, JoinOperatorBase.JoinHint.REPARTITION_HASH_FIRST).where(0).equalTo(0) .with(new FlatJoinFunctionTuple2ID, Long, Tuple2ID, Long, Tuple3ID, Long, Long() { @Override public void join(Tuple2ID, Long first, Tuple2ID, Long second, CollectorTuple3ID, Long, Long out) { out.collect(new Tuple3ID, Long, Long(first.f0, first.f1, second.f1)); } }).withForwardedFieldsFirst(f0;f1).withForwardedFieldsSecond(f1); System.out.println(degrees count: + degrees.count()); } {noformat} This prints 1, but if I comment out the hashCode, it prints 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2490) Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket
[ https://issues.apache.org/jira/browse/FLINK-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711439#comment-14711439 ] ASF GitHub Bot commented on FLINK-2490: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/992#discussion_r37879401 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/functions/source/SocketTextStreamFunction.java --- @@ -42,11 +42,13 @@ private boolean retryForever; private Socket socket; private static final int CONNECTION_TIMEOUT_TIME = 0; - private static final int CONNECTION_RETRY_SLEEP = 1000; + public static int CONNECTION_RETRY_SLEEP = 1000; --- End diff -- This shouldn't be modifiable by everyone. Please make it just package-visible by removing the `public` modifier. Also, please keep the `final` modifier because the current implementation just lets the number of retries be configurable with a fixed 1 second retry rate. This is also documented in the user-facing API methods on DataStream. Remove unwanted boolean check in function SocketTextStreamFunction.streamFromSocket --- Key: FLINK-2490 URL: https://issues.apache.org/jira/browse/FLINK-2490 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.10 Reporter: Huang Wei Priority: Minor Fix For: 0.10 Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2097) Add support for JobSessions
[ https://issues.apache.org/jira/browse/FLINK-2097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2097: --- Fix Version/s: (was: 0.9) 0.10 Add support for JobSessions --- Key: FLINK-2097 URL: https://issues.apache.org/jira/browse/FLINK-2097 Project: Flink Issue Type: Sub-task Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Maximilian Michels Fix For: 0.10 Sessions make sure that the JobManager does not immediately discard a JobGraph after execution, but keeps it around for further operations to be attached to the graph. By keeping the JobGraph around, the cached streams (intermediate data) are also kept, That is the way of realizing interactive sessions on top of a streaming dataflow abstraction. ExecutionGraphs should be kept as long as - no timeout occurred or - the session has not been explicitly ended -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1129) The Plan Visualizer Cuts of the Lower Part of Certain Operators
[ https://issues.apache.org/jira/browse/FLINK-1129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1129: --- Fix Version/s: (was: 0.9) 0.10 The Plan Visualizer Cuts of the Lower Part of Certain Operators --- Key: FLINK-1129 URL: https://issues.apache.org/jira/browse/FLINK-1129 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 0.10 Attachments: screenshot-1.png -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2570) Add a Triangle Count Library Method
[ https://issues.apache.org/jira/browse/FLINK-2570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711308#comment-14711308 ] ASF GitHub Bot commented on FLINK-2570: --- Github user vasia commented on a diff in the pull request: https://github.com/apache/flink/pull/1054#discussion_r37867925 --- Diff: flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/utils/TriangleCountData.java --- @@ -0,0 +1,56 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * License); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.graph.example.utils; + +import org.apache.flink.api.java.DataSet; +import org.apache.flink.api.java.ExecutionEnvironment; +import org.apache.flink.graph.Edge; +import org.apache.flink.types.NullValue; + +import java.util.ArrayList; +import java.util.List; + +/** + * Provides the default data sets used for the Triangle Count example. --- End diff -- There is no example :) Add a Triangle Count Library Method --- Key: FLINK-2570 URL: https://issues.apache.org/jira/browse/FLINK-2570 Project: Flink Issue Type: Task Components: Gelly Affects Versions: 0.10 Reporter: Andra Lungu Assignee: Andra Lungu Priority: Minor The Gather-Sum-Apply-Scatter version of this algorithm receives an undirected graph as input and outputs the total number of triangles formed by the graph's edges. The implementation consists of three phases: 1). Select neighbours with id greater than the current vertex id. Gather: no-op Sum: create a set out of these neighbours Apply: attach the computed values to the vertices 2). Propagate each received value to neighbours with higher id (again using GSA) 3). Compute the number of Triangles by verifying if the final vertex contains the sender's id in its list. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1240) We cannot use sortGroup on a global reduce
[ https://issues.apache.org/jira/browse/FLINK-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1240: --- Fix Version/s: (was: 0.9) 0.10 We cannot use sortGroup on a global reduce -- Key: FLINK-1240 URL: https://issues.apache.org/jira/browse/FLINK-1240 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Priority: Minor Fix For: 0.10 This is only an API problem, I hope. I also know, that this is potentially a very bad idea because everything must be sorted on one node. In some cases, such as sorted first-n this would make sense, though, since there we use a combiner. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1278) Remove the Record special code paths
[ https://issues.apache.org/jira/browse/FLINK-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1278: --- Assignee: (was: Kostas Tzoumas) Remove the Record special code paths Key: FLINK-1278 URL: https://issues.apache.org/jira/browse/FLINK-1278 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.8.0 Reporter: Stephan Ewen Priority: Minor Fix For: 0.10 There are some legacy Record code paths in the runtime, which are often forgotten to be kept in sync and cause errors if people actually use records. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1851) Java Table API does not support Casting
[ https://issues.apache.org/jira/browse/FLINK-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1851: --- Fix Version/s: (was: 0.9) 0.10 Java Table API does not support Casting --- Key: FLINK-1851 URL: https://issues.apache.org/jira/browse/FLINK-1851 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Fix For: 0.10 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711430#comment-14711430 ] ASF GitHub Bot commented on FLINK-2200: --- Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134623006 @rmetzger Thanks! I addressed your comment and rebased on master. Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2200) Flink API with Scala 2.11 - Maven Repository
[ https://issues.apache.org/jira/browse/FLINK-2200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711432#comment-14711432 ] ASF GitHub Bot commented on FLINK-2200: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134624663 Cool, that was quick ;) Flink API with Scala 2.11 - Maven Repository Key: FLINK-2200 URL: https://issues.apache.org/jira/browse/FLINK-2200 Project: Flink Issue Type: Wish Components: Build System, Scala API Reporter: Philipp Götze Assignee: Chiwan Park Priority: Trivial Labels: maven It would be nice if you could upload a pre-built version of the Flink API with Scala 2.11 to the maven repository. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2200] Add Flink with Scala 2.11 in Mave...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/885#issuecomment-134623006 @rmetzger Thanks! I addressed your comment and rebased on master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2569] [core] Add CsvReader support for ...
Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37884384 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- @tillrohrmann Yeah, I know that. But to use `collect` instead of writing to disk, we need to change all test methods in `CsvReaderITCase`. Maybe we can cover this in other issue (FLINK-2032). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2569) CsvReader support for ValueTypes
[ https://issues.apache.org/jira/browse/FLINK-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711534#comment-14711534 ] ASF GitHub Bot commented on FLINK-2569: --- Github user chiwanpark commented on a diff in the pull request: https://github.com/apache/flink/pull/1053#discussion_r37884384 --- Diff: flink-tests/src/test/java/org/apache/flink/test/io/CsvReaderITCase.java --- @@ -123,6 +132,21 @@ public void testPOJOTypeWithFieldsOrderAndFieldsSelection() throws Exception { expected = ABC,3,0.00\nDEF,5,0.00\nDEF,1,0.00\nGHI,10,0.00; } + @Test + public void testValueTypes() throws Exception { + final String inputData = ABC,true,1,2,3,4,5.0,6.0\nBCD,false,1,2,3,4,5.0,6.0; + final String dataPath = createInputData(inputData); + final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + + DataSetTuple8StringValue, BooleanValue, ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue data = + env.readCsvFile(dataPath).types(StringValue.class, BooleanValue.class, ByteValue.class, ShortValue.class, IntValue.class, LongValue.class, FloatValue.class, DoubleValue.class); + data.writeAsText(resultPath); --- End diff -- @tillrohrmann Yeah, I know that. But to use `collect` instead of writing to disk, we need to change all test methods in `CsvReaderITCase`. Maybe we can cover this in other issue (FLINK-2032). CsvReader support for ValueTypes Key: FLINK-2569 URL: https://issues.apache.org/jira/browse/FLINK-2569 Project: Flink Issue Type: Improvement Components: Java API Affects Versions: 0.9 Reporter: Greg Hogan Assignee: Chiwan Park Priority: Minor From the Flink Programming Guide section on Data Sources: {quote} readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic java types and their Value counterparts as field types. {quote} When specifying a ValueType, i.e. {code} CsvReader csvReader = env.readCsvFile(filename); csvReader.types(IntValue.class, IntValue.class); {code} the following error occurs as BasicTypeInfo is specifically requested in CsvReader.types(...). {code} org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:327) at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:608) at org.apache.flink.client.CliFrontend.run(CliFrontend.java:296) at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:927) at org.apache.flink.client.CliFrontend.main(CliFrontend.java:977) Caused by: java.lang.IllegalArgumentException: Type at position 0 is not a basic type. at org.apache.flink.api.java.typeutils.TupleTypeInfo.getBasicTupleTypeInfo(TupleTypeInfo.java:177) at org.apache.flink.api.java.io.CsvReader.types(CsvReader.java:393) at Driver.main(Driver.java:105) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 6 more {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1426) JobManager AJAX requests sometimes fail
[ https://issues.apache.org/jira/browse/FLINK-1426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1426. Resolution: Invalid This is superseded by the new web front end. I didn't see any recent progress in the repo you've linked. If this is still ongoing, we will have to sync this with the ongoing progress to refactor the web interface. JobManager AJAX requests sometimes fail --- Key: FLINK-1426 URL: https://issues.apache.org/jira/browse/FLINK-1426 Project: Flink Issue Type: Bug Components: JobManager, Webfrontend Reporter: Robert Metzger It seems that the JobManager sometimes (I think when accessing it the first time) does not show the number of TMs / slots. A simple workaround is re-loading it, but still, users are complaining about it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1639) Document the Flink deployment scripts to make sure others know how to make release
[ https://issues.apache.org/jira/browse/FLINK-1639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1639. Resolution: Fixed Assignee: (was: Márton Balassi) Documentation has been added in https://cwiki.apache.org/confluence/display/FLINK/Releasing Max also added more comments to the script itself. Document the Flink deployment scripts to make sure others know how to make release -- Key: FLINK-1639 URL: https://issues.apache.org/jira/browse/FLINK-1639 Project: Flink Issue Type: Task Components: release Reporter: Henry Saputra Currently, Robert knows the detail about Flink deployment and release scripts to support both Hadoop versions. Need to document details black magic used in the scripts to make sure other knows how the flow work just in case we need to push release and Robert is not available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2504) ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed spuriously
[ https://issues.apache.org/jira/browse/FLINK-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2504: --- Labels: test-stability (was: ) ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed failed spuriously - Key: FLINK-2504 URL: https://issues.apache.org/jira/browse/FLINK-2504 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Labels: test-stability The test {{ExternalSortLargeRecordsITCase.testSortWithLongAndShortRecordsMixed}} failed in one of my Travis builds: https://travis-ci.org/tillrohrmann/flink/jobs/74881883 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37881554 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.util; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.api.common.io.FileInputFormat; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.runtime.operators.testutils.MockEnvironment; +import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider; +import org.apache.flink.runtime.state.LocalStateHandle; +import org.apache.flink.streaming.api.functions.source.FileSourceFunction; +import org.apache.flink.streaming.api.functions.source.SourceFunction; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext; +import org.apache.flink.types.IntValue; +import org.junit.Assert; +import org.junit.Test; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; + +public class FileSourceFunctionTest { + @Test + public void testFileSourceFunction() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.run(ctx); + } catch (Exception e) { + e.printStackTrace(); + } + Assert.assertTrue(ctx.getData().size() == 200); + } + + @Test + public void testFileSourceFunctionCheckpoint() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.restoreState(100:1); +
[jira] [Commented] (FLINK-2314) Make Streaming File Sources Persistent
[ https://issues.apache.org/jira/browse/FLINK-2314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711480#comment-14711480 ] ASF GitHub Bot commented on FLINK-2314: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37881537 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.util; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.api.common.io.FileInputFormat; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.runtime.operators.testutils.MockEnvironment; +import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider; +import org.apache.flink.runtime.state.LocalStateHandle; +import org.apache.flink.streaming.api.functions.source.FileSourceFunction; +import org.apache.flink.streaming.api.functions.source.SourceFunction; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext; +import org.apache.flink.types.IntValue; +import org.junit.Assert; +import org.junit.Test; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; + +public class FileSourceFunctionTest { + @Test + public void testFileSourceFunction() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.run(ctx); + } catch (Exception e) { + e.printStackTrace(); + } + Assert.assertTrue(ctx.getData().size() == 200); + } + + @Test + public void testFileSourceFunctionCheckpoint() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); +
[jira] [Commented] (FLINK-2480) Improving tests coverage for org.apache.flink.streaming.api
[ https://issues.apache.org/jira/browse/FLINK-2480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711469#comment-14711469 ] ASF GitHub Bot commented on FLINK-2480: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/991#issuecomment-134635720 @HuangWHWHW Can you access the CI reports now? Has the Travis team fixed the problem? Improving tests coverage for org.apache.flink.streaming.api --- Key: FLINK-2480 URL: https://issues.apache.org/jira/browse/FLINK-2480 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.10 Reporter: Huang Wei Fix For: 0.10 Original Estimate: 504h Remaining Estimate: 504h The streaming API is quite a bit newer than the other code so it is not that well covered with tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2480][test]Add tests for PrintSinkFunct...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/991#issuecomment-134635720 @HuangWHWHW Can you access the CI reports now? Has the Travis team fixed the problem? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2291) Use ZooKeeper to elect JobManager leader and send information to TaskManagers
[ https://issues.apache.org/jira/browse/FLINK-2291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711513#comment-14711513 ] ASF GitHub Bot commented on FLINK-2291: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/1016#issuecomment-134647030 +1 to merge, we should follow up on the Mini cluster and Curator shading separately Use ZooKeeper to elect JobManager leader and send information to TaskManagers - Key: FLINK-2291 URL: https://issues.apache.org/jira/browse/FLINK-2291 Project: Flink Issue Type: Sub-task Components: JobManager, TaskManager Affects Versions: 0.10 Reporter: Till Rohrmann Assignee: Till Rohrmann Fix For: 0.10 Use ZooKeeper to determine the leader of a set of {{JobManagers}} which will act as the responsible {{JobManager}} for all {{TaskManager}}. The {{TaskManager}} will get the address of the leader from ZooKeeper. Related Wiki: [https://cwiki.apache.org/confluence/display/FLINK/JobManager+High+Availability] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1158) Logging property files missing in project created by archetypes
[ https://issues.apache.org/jira/browse/FLINK-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1158. Resolution: Fixed Fix Version/s: 0.9 The current archetypes have a logging property file. Logging property files missing in project created by archetypes --- Key: FLINK-1158 URL: https://issues.apache.org/jira/browse/FLINK-1158 Project: Flink Issue Type: Improvement Components: Build System Affects Versions: 0.7.0-incubating Reporter: Till Rohrmann Fix For: 0.9 If one creates a flink project using the archetypes, then there are no predefined logging properties files. Would be very convenient for the user to have them generated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2565] Support primitive Arrays as keys
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/1043#issuecomment-134665239 Thanks! Good to merge, IMO. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1929) Add code to cleanly stop a running streaming topology
[ https://issues.apache.org/jira/browse/FLINK-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1929. Resolution: Fixed Add code to cleanly stop a running streaming topology - Key: FLINK-1929 URL: https://issues.apache.org/jira/browse/FLINK-1929 Project: Flink Issue Type: Improvement Components: Streaming Affects Versions: 0.9 Reporter: Robert Metzger Right now its not possible to cleanly stop a running Streaming topology. Cancelling the job will cancel all operators, but for proper exactly once processing from Kafka sources, we need to provide a way to stop the sources first, wait until all remaining tuples have been processed and then shut down the sources (so that they can commit the right offset to Zookeeper). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2276) Travis build error
[ https://issues.apache.org/jira/browse/FLINK-2276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-2276: --- Labels: test-stability (was: ) Travis build error -- Key: FLINK-2276 URL: https://issues.apache.org/jira/browse/FLINK-2276 Project: Flink Issue Type: Bug Reporter: Sachin Goel Labels: test-stability testExecutionFailsAfterTaskMarkedFailed on travis. Here is the log output: https://s3.amazonaws.com/archive.travis-ci.org/jobs/68288986/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2189) NullPointerException in MutableHashTable
[ https://issues.apache.org/jira/browse/FLINK-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711620#comment-14711620 ] Till Rohrmann commented on FLINK-2189: -- [~JonathanH5] encountered this problem recently. NullPointerException in MutableHashTable Key: FLINK-2189 URL: https://issues.apache.org/jira/browse/FLINK-2189 Project: Flink Issue Type: Bug Components: Core Reporter: Till Rohrmann [~Felix Neutatz] reported a {{NullPointerException}} in the {{MutableHashTable}} when running the {{ALS}} algorithm. The stack trace is the following: {code} Caused by: java.lang.NullPointerException at org.apache.flink.runtime.operators.hash.HashPartition.spillPartition(HashPartition.java:310) at org.apache.flink.runtime.operators.hash.MutableHashTable.spillPartition(MutableHashTable.java:1094) at org.apache.flink.runtime.operators.hash.MutableHashTable.insertBucketEntry(MutableHashTable.java:927) at org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:783) at org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508) at org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:544) at org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104) at org.apache.flink.runtime.operators.MatchDriver.run(MatchDriver.java:173) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at java.lang.Thread.run(Thread.java:745) {code} He produced this error on his local machine with the following code: {code} implicit val env = ExecutionEnvironment.getExecutionEnvironment val links = MovieLensUtils.readLinks(movieLensDir + links.csv) val movies = MovieLensUtils.readMovies(movieLensDir + movies.csv) val ratings = MovieLensUtils.readRatings(movieLensDir + ratings.csv) val tags = MovieLensUtils.readTags(movieLensDir + tags.csv) val ratingMatrix = ratings.map { r = (r.userId.toInt, r.movieId.toInt, r.rating) } val testMatrix = ratings.map { r = (r.userId.toInt, r.movieId.toInt) } val als = ALS() .setIterations(10) .setNumFactors(10) .setBlocks(150) als.fit(ratingMatrix) val result = als.predict(testMatrix) result.print val risk = als.empiricalRisk(ratingMatrix).collect().apply(0) println(Empirical risk: + risk) env.execute() {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2314] - Added Checkpointing to File Sou...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/997#discussion_r37881478 --- Diff: flink-staging/flink-streaming/flink-streaming-core/src/test/java/org/apache/flink/streaming/util/FileSourceFunctionTest.java --- @@ -0,0 +1,208 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.streaming.util; + +import org.apache.flink.api.common.ExecutionConfig; +import org.apache.flink.api.common.accumulators.Accumulator; +import org.apache.flink.api.common.functions.RuntimeContext; +import org.apache.flink.api.common.io.FileInputFormat; +import org.apache.flink.api.java.functions.KeySelector; +import org.apache.flink.api.java.typeutils.TypeExtractor; +import org.apache.flink.configuration.Configuration; +import org.apache.flink.core.fs.FileInputSplit; +import org.apache.flink.core.fs.Path; +import org.apache.flink.core.io.InputSplit; +import org.apache.flink.runtime.operators.testutils.MockEnvironment; +import org.apache.flink.runtime.operators.testutils.MockInputSplitProvider; +import org.apache.flink.runtime.state.LocalStateHandle; +import org.apache.flink.streaming.api.functions.source.FileSourceFunction; +import org.apache.flink.streaming.api.functions.source.SourceFunction; +import org.apache.flink.streaming.api.watermark.Watermark; +import org.apache.flink.streaming.runtime.tasks.StreamingRuntimeContext; +import org.apache.flink.types.IntValue; +import org.junit.Assert; +import org.junit.Test; + +import java.io.IOException; +import java.io.Serializable; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; + +public class FileSourceFunctionTest { + @Test + public void testFileSourceFunction() { + DummyFileInputFormat inputFormat = new DummyFileInputFormat(); + RuntimeContext runtimeContext = new StreamingRuntimeContext(MockTask, new MockEnvironment(3 * 1024 * 1024, + inputFormat.getDummyInputSplitProvider(), 1024), null, new ExecutionConfig(), new DummyModKey(2), + new LocalStateHandle.LocalStateHandleProviderSerializable(), new HashMapString, Accumulator?, ?()); + + inputFormat.setFilePath(file:///some/none/existing/directory/); + FileSourceFunctionIntValue fileSourceFunction = new FileSourceFunctionIntValue(inputFormat, TypeExtractor.getInputFormatTypes(inputFormat)); + + fileSourceFunction.setRuntimeContext(runtimeContext); + DummyContextIntValue ctx = new DummyContextIntValue(); + try { + fileSourceFunction.open(new Configuration()); + fileSourceFunction.run(ctx); + } catch (Exception e) { + e.printStackTrace(); --- End diff -- Why do print the stack trace instead of simply letting the exception bubbling up? Is this an expected test exception? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2555] Properly pass security credential...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/1038#issuecomment-134632899 I've opened another issue for that: https://issues.apache.org/jira/browse/FLINK-2573 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2394) HadoopOutFormat OutputCommitter is default to FileOutputCommiter
[ https://issues.apache.org/jira/browse/FLINK-2394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711474#comment-14711474 ] ASF GitHub Bot commented on FLINK-2394: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/1056 [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. Right now, Flink's wrappers for Hadoop OutputFormats always use a `FileOutputCommitter`. - In the `mapreduce` API, Hadoop OutputFormats have a method `getOutputCommitter()` which can be overwritten and returns the `FileOutputFormat` by default. - In the `mapred`API, the `OutputCommitter` should be obtained from the `JobConf`. If nothing custom is set, a `FileOutputCommitter` is returned. This PR uses the respective methods to obtain the correct `OutputCommitter`. Since, `FileOutputCommitter` is the default in both cases, the original semantics are preserved if no custom committer is implemented or set by the user. I also added convenience methods to the constructors of the `mapred` wrappers to set the `OutputCommitter` in the `JobConf`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink hadoopOutCommitter Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/1056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1056 commit a632203a948f2e7973339a0eab88750f7ce70cc5 Author: Fabian Hueske fhue...@apache.org Date: 2015-07-30T19:47:01Z [FLINK-2394] [fix] HadoopOutputFormats use correct OutputCommitters. HadoopOutFormat OutputCommitter is default to FileOutputCommiter Key: FLINK-2394 URL: https://issues.apache.org/jira/browse/FLINK-2394 Project: Flink Issue Type: Bug Components: Hadoop Compatibility Affects Versions: 0.9.0 Reporter: Stefano Bortoli Assignee: Fabian Hueske Fix For: 0.10, 0.9.1 MongoOutputFormat does not write back in collection because the HadoopOutputFormat wrapper does not allow to set the MongoOutputCommiter and is set as default to FileOutputCommitter. Therefore, on close and globalFinalize execution the commit does not happen and mongo collection stays untouched. A simple solution would be to: 1 - create a constructor of HadoopOutputFormatBase and HadoopOutputFormat that gets the OutputCommitter as a parameter 2 - change the outputCommitter field of HadoopOutputFormatBase to be a generic OutputCommitter 3 - remove the default assignment in the open() and finalizeGlobal to the outputCommitter to FileOutputCommitter(), or keep it as a default in case of no specific assignment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2478) The page “FlinkML - Machine Learning for Flink“ https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/ contains a dead link
[ https://issues.apache.org/jira/browse/FLINK-2478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2478. Resolution: Fixed Fixed via e68c86f. The page “FlinkML - Machine Learning for Flink“ https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/ contains a dead link - Key: FLINK-2478 URL: https://issues.apache.org/jira/browse/FLINK-2478 Project: Flink Issue Type: Task Components: Documentation Affects Versions: 0.10 Reporter: Slim Baltagi Assignee: Till Rohrmann Priority: Minor Note that FlinkML is currently not part of the binary distribution. See linking with it for cluster execution here. 'here' links to a dead link: https://ci.apache.org/projects/flink/flink-docs-master/libs/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution The correct link is: https://ci.apache.org/projects/flink/flink-docs-master/apis/cluster_execution.html#linking-with-modules-not-contained-in-the-binary-distribution -- This message was sent by Atlassian JIRA (v6.3.4#6332)