[jira] [Resolved] (FLINK-1457) RAT check fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1457. -- Resolution: Fixed Fixed in 06c2c35a2c165b6a612b0be6d00d99f287330dc7 Thanks for the patch! > RAT check fails on Windows > -- > > Key: FLINK-1457 > URL: https://issues.apache.org/jira/browse/FLINK-1457 > Project: Flink > Issue Type: Bug >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Trivial > > On (my) Windows 7 (Maven 3.2.2), the RAT check fails as > flink-addons/flink-avro/src/test/resources/testdata.avro has no approved > license. Not being an actual code file, it should be excluded from the RAT > check so that verification also passes on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1457) RAT check fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294569#comment-14294569 ] ASF GitHub Bot commented on FLINK-1457: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/345 > RAT check fails on Windows > -- > > Key: FLINK-1457 > URL: https://issues.apache.org/jira/browse/FLINK-1457 > Project: Flink > Issue Type: Bug >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Trivial > > On (my) Windows 7 (Maven 3.2.2), the RAT check fails as > flink-addons/flink-avro/src/test/resources/testdata.avro has no approved > license. Not being an actual code file, it should be excluded from the RAT > check so that verification also passes on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/343 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/345 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1329) Enable constant field definitions for Pojo DataTypes
[ https://issues.apache.org/jira/browse/FLINK-1329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1329. -- Resolution: Implemented Implemented in de8e066ccbd0a31e5746bc0bee524a48bba3a552 > Enable constant field definitions for Pojo DataTypes > > > Key: FLINK-1329 > URL: https://issues.apache.org/jira/browse/FLINK-1329 > Project: Flink > Issue Type: Sub-task > Components: Java API, Optimizer, Scala API >Affects Versions: 0.7.0-incubating >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > Enable constant field annotations also for Pojo data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/343#issuecomment-71765382 +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1328) Rework Constant Field Annotations
[ https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1328. -- Resolution: Fixed Fixed in de8e066ccbd0a31e5746bc0bee524a48bba3a552 > Rework Constant Field Annotations > - > > Key: FLINK-1328 > URL: https://issues.apache.org/jira/browse/FLINK-1328 > Project: Flink > Issue Type: Improvement > Components: Java API, Optimizer, Scala API >Affects Versions: 0.7.0-incubating >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > Constant field annotations are used by the optimizer to determine whether > physical data properties such as sorting or partitioning are retained by user > defined functions. > The current implementation is limited and can be extended in several ways: > - Fields that are copied to other positions > - Field definitions for non-tuple data types (Pojos) > There is a pull request (#83) that goes into this direction and which can be > extended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-846) Semantic Annotations - Inconsistencies
[ https://issues.apache.org/jira/browse/FLINK-846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-846. - Resolution: Implemented Implemented in de8e066ccbd0a31e5746bc0bee524a48bba3a552 > Semantic Annotations - Inconsistencies > -- > > Key: FLINK-846 > URL: https://issues.apache.org/jira/browse/FLINK-846 > Project: Flink > Issue Type: Sub-task >Reporter: GitHub Import >Assignee: Fabian Hueske > Labels: github-import > Fix For: pre-apache > > > This is basically a thread for discussion. There are still a few > inconsistencies in the semantic annotations. The following examples > illustrate that: > _Constant fields_: > `"1 -> 0,1" "3->4"` means field 1 is copied unmodified to 0 and 1, while 3 > becomes 4. > `"1 ->0,1 ; 3->4"` means the same thing, expressed in a single string > (semicolon is the delimiter between statements about fields). > `"1 , 3"` means that 1 and 3 remain constant as 1 and 3. Note that comma is > the delimiter here. > _Read Fields_: > `"0 , 2"` means that fields 0 and 2 are read. Note that here, the delimiter > is the comma, like with the standalone constant fields, unlike the constant > fields with arrow notation. > I find that a bit inconsistent, especially the mixed use of comma and > semicolon for the constant fields. > - Do we want to keep it that way? > - Or use the semicolon for the standalone constant fields and read fields? > - Or use the comma in the constant fields (finding an alternative delimiter > for the right hand side of the arrow? > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/846 > Created by: [StephanEwen|https://github.com/StephanEwen] > Labels: > Created at: Wed May 21 20:54:26 CEST 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1156) Bug in Constant Fields Annotation (Semantic Properties)
[ https://issues.apache.org/jira/browse/FLINK-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske closed FLINK-1156. Resolution: Fixed Fixed in de8e066ccbd0a31e5746bc0bee524a48bba3a552 > Bug in Constant Fields Annotation (Semantic Properties) > > > Key: FLINK-1156 > URL: https://issues.apache.org/jira/browse/FLINK-1156 > Project: Flink > Issue Type: Sub-task >Reporter: Aljoscha Krettek >Assignee: Fabian Hueske > > When I change the first test in SemanticPropUtilTest.java to this: > {code} > @Test > public void testConstantWithArrowIndividualStrings() { > String[] constantFields = { "0->0,1", "1->3" }; > TypeInformation type = new TupleTypeInfo Integer>>(BasicTypeInfo.INT_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO); > TypeInformation outType = new TupleTypeInfo Integer, Integer>>(BasicTypeInfo.INT_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO, > BasicTypeInfo.INT_TYPE_INFO, BasicTypeInfo.INT_TYPE_INFO); > SingleInputSemanticProperties sp = > SemanticPropUtil.getSemanticPropsSingleFromString(constantFields, null, null, > type, outType); > FieldSet fs = sp.getForwardedField(0); > Assert.assertTrue(fs.size() == 2); > Assert.assertTrue(fs.contains(0)); > Assert.assertTrue(fs.contains(1)); > fs = sp.getForwardedField(1); > Assert.assertTrue(fs.size() == 1); > Assert.assertTrue(fs.contains(2)); > } > {code} > It fails. It seems the check whether a tuple field is in range is performed > on the input type and the output type is being ignored. > Also, I'm not sure this whole thing works anymore with [~rmetzger]'s recent > POJO rework and nested tuple field selection rework. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations
[ https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294549#comment-14294549 ] ASF GitHub Bot commented on FLINK-1328: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/311 > Rework Constant Field Annotations > - > > Key: FLINK-1328 > URL: https://issues.apache.org/jira/browse/FLINK-1328 > Project: Flink > Issue Type: Improvement > Components: Java API, Optimizer, Scala API >Affects Versions: 0.7.0-incubating >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > Constant field annotations are used by the optimizer to determine whether > physical data properties such as sorting or partitioning are retained by user > defined functions. > The current implementation is limited and can be extended in several ways: > - Fields that are copied to other positions > - Field definitions for non-tuple data types (Pojos) > There is a pull request (#83) that goes into this direction and which can be > extended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: integrated forwarded fields
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/83 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/311 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294517#comment-14294517 ] Fabian Hueske commented on FLINK-1396: -- +1 for having direct API methods for HadoopInputFormats. If you do it manually, its quite a few lines of ugly boilerplate code. I am also +1 for moving the hadoop-compat code to flink-java. Are we talking about all wrappers or only IFs and OFs, btw? > Add hadoop input formats directly to the user API. > -- > > Key: FLINK-1396 > URL: https://issues.apache.org/jira/browse/FLINK-1396 > Project: Flink > Issue Type: Bug >Reporter: Robert Metzger >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [Typo] Delete DiscardingOuputFormat
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/343#issuecomment-71760881 +1 good to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1450) Add Fold operator to the Streaming api
[ https://issues.apache.org/jira/browse/FLINK-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294506#comment-14294506 ] Fabian Hueske commented on FLINK-1450: -- So Reduce is kind of a special case of Fold where the input and output types happen to be the same and the first element is a "neutral" element? > Add Fold operator to the Streaming api > -- > > Key: FLINK-1450 > URL: https://issues.apache.org/jira/browse/FLINK-1450 > Project: Flink > Issue Type: New Feature > Components: Streaming >Affects Versions: 0.9 >Reporter: Gyula Fora >Priority: Minor > Labels: starter > > The streaming API currently doesn't support a fold operator. > This operator would work as the foldLeft method in Scala. This would allow > effective implementations in a lot of cases where a the simple reduce is > inappropriate due to different return types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294501#comment-14294501 ] ASF GitHub Bot commented on FLINK-1318: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/265#issuecomment-71760224 Yes, that's still left to do... ;-) > Make quoted String parsing optional and configurable for CSVInputFormats > > > Key: FLINK-1318 > URL: https://issues.apache.org/jira/browse/FLINK-1318 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 0.8 >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Minor > > With the current implementation of the CSVInputFormat, quoted string parsing > kicks in, if the first non-whitespace character of a field is a double quote. > I see two issues with this implementation: > 1. Quoted String parsing cannot be disabled > 2. The quoting character is fixed to double quotes (") > I propose to add parameters to disable quoted String parsing and set the > quote character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/265#issuecomment-71760224 Yes, that's still left to do... ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294433#comment-14294433 ] ASF GitHub Bot commented on FLINK-1419: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71754470 You're right Chesney. I assume that the faulty DC wasn't noticed because it was probably never really used ;-) Your solution should make the DC to work properly. We could even get rid of the second counter by simply decrementing the counter upon deletion. If the counter is 0, then the file can be deleted. Nice illustrations btw. > DistributedCache doesn't preserver files for subsequent operations > -- > > Key: FLINK-1419 > URL: https://issues.apache.org/jira/browse/FLINK-1419 > Project: Flink > Issue Type: Bug >Affects Versions: 0.8, 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > When subsequent operations want to access the same files in the DC it > frequently happens that the files are not created for the following operation. > This is fairly odd, since the DC is supposed to either a) preserve files when > another operation kicks in within a certain time window, or b) just recreate > the deleted files. Both things don't happen. > Increasing the time window had no effect. > I'd like to use this issue as a starting point for a more general discussion > about the DistributedCache. > Currently: > 1. all files reside in a common job-specific directory > 2. are deleted during the job. > > One thing that was brought up about Trait 1 is that it basically forbids > modification of the files, concurrent access and all. Personally I'm not sure > if this a problem. Changing it to a task-specific place solved the issue > though. > I'm more concerned about Trait #2. Besides the mentioned issue, the deletion > is realized with the scheduler, which adds a lot of complexity to the current > code. (It really is a pain to work on...) > If we moved the deletion to the end of the job it could be done as a clean-up > step in the TaskManager, With this we could reduce the DC to a > cacheFile(String source) method, the delete method in the TM, and throw out > everything else. > Also, the current implementation implies that big files may be copied > multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71754470 You're right Chesney. I assume that the faulty DC wasn't noticed because it was probably never really used ;-) Your solution should make the DC to work properly. We could even get rid of the second counter by simply decrementing the counter upon deletion. If the counter is 0, then the file can be deleted. Nice illustrations btw. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1401) Add plan visualiser support for Streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora resolved FLINK-1401. --- Resolution: Fixed > Add plan visualiser support for Streaming programs > -- > > Key: FLINK-1401 > URL: https://issues.apache.org/jira/browse/FLINK-1401 > Project: Flink > Issue Type: New Feature > Components: Streaming >Affects Versions: 0.9 >Reporter: Gyula Fora >Assignee: Gyula Fora >Priority: Minor > > The streaming api currently does not generate visualisable program plans as > the batch api does. This feature needs to be added to help users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1434) Web interface cannot be used to run streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gyula Fora resolved FLINK-1434. --- Resolution: Fixed > Web interface cannot be used to run streaming programs > -- > > Key: FLINK-1434 > URL: https://issues.apache.org/jira/browse/FLINK-1434 > Project: Flink > Issue Type: Bug > Components: Streaming, Webfrontend >Affects Versions: 0.9 >Reporter: Gyula Fora >Assignee: Gyula Fora > > Flink streaming programs currently cannot be submitted through the web > client. When you try run the jar you get a ProgramInvocationException. > The reason for this might be that streaming programs completely bypass the > use of Plans for job execution and the streaming execution environment > directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1434) Web interface cannot be used to run streaming programs
[ https://issues.apache.org/jira/browse/FLINK-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294410#comment-14294410 ] ASF GitHub Bot commented on FLINK-1434: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/334 > Web interface cannot be used to run streaming programs > -- > > Key: FLINK-1434 > URL: https://issues.apache.org/jira/browse/FLINK-1434 > Project: Flink > Issue Type: Bug > Components: Streaming, Webfrontend >Affects Versions: 0.9 >Reporter: Gyula Fora >Assignee: Gyula Fora > > Flink streaming programs currently cannot be submitted through the web > client. When you try run the jar you get a ProgramInvocationException. > The reason for this might be that streaming programs completely bypass the > use of Plans for job execution and the streaming execution environment > directly submits the jobgraph to the client. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1434] [FLINK-1401] Streaming support ad...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/334 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1458) Interfaces and abstract classes are not valid types
[ https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294191#comment-14294191 ] Stephan Ewen commented on FLINK-1458: - John, this is a limitation that has been fixed and is about to be merged. [~aljoscha] Can give you the details. In general, using abstract types is a bit less efficient than using concrete types (since subclass information has to be carried with the code), but it will work. > Interfaces and abstract classes are not valid types > --- > > Key: FLINK-1458 > URL: https://issues.apache.org/jira/browse/FLINK-1458 > Project: Flink > Issue Type: Bug >Reporter: John Sandiford > > I don't know whether this is by design or is a bug, but I am having trouble > working with DataSet and traits in scala which is a major limitation. A > simple example is shown below. > Compile time warning is 'Type Main.SimpleTrait has no fields that are visible > from Scala Type analysis. Falling back to Java Type Analysis...' > Run time error is 'Interfaces and abstract classes are not valid types: > interface Main$SimpleTrait' > Regards, John > val env = ExecutionEnvironment.getExecutionEnvironment > trait SimpleTrait { > def contains(x: String): Boolean > } > class SimpleClass extends SimpleTrait { > def contains(x: String) = true > } > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f(data: DataSet[Double]): DataSet[SimpleTrait] = { > data.mapPartition(iterator => { > Iterator(new SimpleClass) > }) > } > val g = f(data) > g.print() > env.execute("Simple example") -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294187#comment-14294187 ] ASF GitHub Bot commented on FLINK-377: -- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71727262 @dan-blanchard That is fine. Thank you for the pointer to Storm's multilang protocol. We'll have a look at it and see whether we can make something similar work with Flink. > Create a general purpose framework for language bindings > > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement >Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71727262 @dan-blanchard That is fine. Thank you for the pointer to Storm's multilang protocol. We'll have a look at it and see whether we can make something similar work with Flink. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1378) could not find implicit value for evidence parameter of type TypeInformation
[ https://issues.apache.org/jira/browse/FLINK-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1378. - Resolution: Fixed Fix Version/s: 0.8 0.9 Fixed via 935e316a38367cab513dfe2b010129e5d47b7b68 > could not find implicit value for evidence parameter of type TypeInformation > > > Key: FLINK-1378 > URL: https://issues.apache.org/jira/browse/FLINK-1378 > Project: Flink > Issue Type: Bug > Components: Scala API >Affects Versions: 0.7.0-incubating >Reporter: John Sandiford >Assignee: Aljoscha Krettek > Fix For: 0.9, 0.8 > > > This is an example of one of many cases that I cannot get to compile with the > scala API. I have tried using T : TypeInformation and : ClassTag but still > cannot get it to work. > //libraryDependencies += "org.apache.flink" % "flink-scala" % > "0.7.0-incubating" > // > //libraryDependencies += "org.apache.flink" % "flink-clients" % > "0.7.0-incubating" > import org.apache.flink.api.scala._ > import scala.util.{Success, Try} > object Main extends App { > val env = ExecutionEnvironment.getExecutionEnvironment > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f[T](data: DataSet[T]): DataSet[(T, Try[Seq[Double]])] = { > data.mapPartition((iterator: Iterator[T]) => { > val first = iterator.next() > val second = iterator.next() > Iterator((first, Success(Seq(2.0, 3.0))), (second, Success(Seq(3.0, > 1.0 > }) > } > val g = f(data) > g.print() > env.execute("Flink Test") > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1458) Interfaces and abstract classes are not valid types
[ https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1458. -- Resolution: Duplicate Duplicate of FLINK-1369 > Interfaces and abstract classes are not valid types > --- > > Key: FLINK-1458 > URL: https://issues.apache.org/jira/browse/FLINK-1458 > Project: Flink > Issue Type: Bug >Reporter: John Sandiford > > I don't know whether this is by design or is a bug, but I am having trouble > working with DataSet and traits in scala which is a major limitation. A > simple example is shown below. > Compile time warning is 'Type Main.SimpleTrait has no fields that are visible > from Scala Type analysis. Falling back to Java Type Analysis...' > Run time error is 'Interfaces and abstract classes are not valid types: > interface Main$SimpleTrait' > Regards, John > val env = ExecutionEnvironment.getExecutionEnvironment > trait SimpleTrait { > def contains(x: String): Boolean > } > class SimpleClass extends SimpleTrait { > def contains(x: String) = true > } > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f(data: DataSet[Double]): DataSet[SimpleTrait] = { > data.mapPartition(iterator => { > Iterator(new SimpleClass) > }) > } > val g = f(data) > g.print() > env.execute("Simple example") -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1458) Interfaces and abstract classes are not valid types
[ https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294182#comment-14294182 ] Fabian Hueske commented on FLINK-1458: -- Yes, this is a limitation at the moment but the good news is there is already a pull request which should solve this issue (https://github.com/apache/flink/pull/316). :-) I will close this issue as it is a duplicate of FLINK-1369 > Interfaces and abstract classes are not valid types > --- > > Key: FLINK-1458 > URL: https://issues.apache.org/jira/browse/FLINK-1458 > Project: Flink > Issue Type: Bug >Reporter: John Sandiford > > I don't know whether this is by design or is a bug, but I am having trouble > working with DataSet and traits in scala which is a major limitation. A > simple example is shown below. > Compile time warning is 'Type Main.SimpleTrait has no fields that are visible > from Scala Type analysis. Falling back to Java Type Analysis...' > Run time error is 'Interfaces and abstract classes are not valid types: > interface Main$SimpleTrait' > Regards, John > val env = ExecutionEnvironment.getExecutionEnvironment > trait SimpleTrait { > def contains(x: String): Boolean > } > class SimpleClass extends SimpleTrait { > def contains(x: String) = true > } > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f(data: DataSet[Double]): DataSet[SimpleTrait] = { > data.mapPartition(iterator => { > Iterator(new SimpleClass) > }) > } > val g = f(data) > g.print() > env.execute("Simple example") -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Implement the convenience methods count and co...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/210#issuecomment-71726730 @zentol You are right, for the time being, that this results in parts in repeated execution. While not totally unavoidable in all cases, the code going in soon about caching intermediate results will help there big time. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71722351 We can bind the `unlink`to the `pre-clean` phase, see if that helps. All in all, if it does not work, it does not work. This is a nice utility, by no way crucial enough to spent huge amounts of time on it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294137#comment-14294137 ] ASF GitHub Bot commented on FLINK-1330: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71722351 We can bind the `unlink`to the `pre-clean` phase, see if that helps. All in all, if it does not work, it does not work. This is a nice utility, by no way crucial enough to spent huge amounts of time on it... > Restructure directory layout > > > Key: FLINK-1330 > URL: https://issues.apache.org/jira/browse/FLINK-1330 > Project: Flink > Issue Type: Improvement > Components: Build System, Documentation >Reporter: Max Michels >Priority: Minor > Labels: usability > > When building Flink, the build results can currently be found under > "flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/". > I think we could improve the directory layout with the following: > - provide the bin folder in the root by default > - let the start up and submissions scripts in bin assemble the class path > - in case the project hasn't been build yet, inform the user > The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user dan-blanchard commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71705775 @zentol Thanks for the info. I'm sorry to derail the PR conversation here for a bit! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293973#comment-14293973 ] ASF GitHub Bot commented on FLINK-377: -- Github user dan-blanchard commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71705775 @zentol Thanks for the info. I'm sorry to derail the PR conversation here for a bit! > Create a general purpose framework for language bindings > > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement >Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1458) Interfaces and abstract classes are not valid types
[ https://issues.apache.org/jira/browse/FLINK-1458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Sandiford updated FLINK-1458: -- Description: I don't know whether this is by design or is a bug, but I am having trouble working with DataSet and traits in scala which is a major limitation. A simple example is shown below. Compile time warning is 'Type Main.SimpleTrait has no fields that are visible from Scala Type analysis. Falling back to Java Type Analysis...' Run time error is 'Interfaces and abstract classes are not valid types: interface Main$SimpleTrait' Regards, John val env = ExecutionEnvironment.getExecutionEnvironment trait SimpleTrait { def contains(x: String): Boolean } class SimpleClass extends SimpleTrait { def contains(x: String) = true } val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) def f(data: DataSet[Double]): DataSet[SimpleTrait] = { data.mapPartition(iterator => { Iterator(new SimpleClass) }) } val g = f(data) g.print() env.execute("Simple example") was: I don't know whether this is by design or is a bug, but I am having trouble working with DataSet and traits in scala which is a major limitation. A simple example is shown below. Compile time warning is 'Type Main.SimpleTrait has no fields that are visible from Scala Type analysis. Falling back to Java Type Analysis...' Run time error is 'Interfaces and abstract classes are not valid types: interface Main$SimpleTrait' Regards, John val env = ExecutionEnvironment.getExecutionEnvironment trait SimpleTrait { def contains(x: String): Boolean } class SimpleClass extends SimpleTrait { def contains(x: String) = true } val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) def f(data: DataSet[Double]): DataSet[SimpleTrait] = { data.mapPartition(iterator => { Iterator(new SimpleClass) }) } val g = f(data) g.print() > Interfaces and abstract classes are not valid types > --- > > Key: FLINK-1458 > URL: https://issues.apache.org/jira/browse/FLINK-1458 > Project: Flink > Issue Type: Bug >Reporter: John Sandiford > > I don't know whether this is by design or is a bug, but I am having trouble > working with DataSet and traits in scala which is a major limitation. A > simple example is shown below. > Compile time warning is 'Type Main.SimpleTrait has no fields that are visible > from Scala Type analysis. Falling back to Java Type Analysis...' > Run time error is 'Interfaces and abstract classes are not valid types: > interface Main$SimpleTrait' > Regards, John > val env = ExecutionEnvironment.getExecutionEnvironment > trait SimpleTrait { > def contains(x: String): Boolean > } > class SimpleClass extends SimpleTrait { > def contains(x: String) = true > } > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f(data: DataSet[Double]): DataSet[SimpleTrait] = { > data.mapPartition(iterator => { > Iterator(new SimpleClass) > }) > } > val g = f(data) > g.print() > env.execute("Simple example") -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1458) Interfaces and abstract classes are not valid types
John Sandiford created FLINK-1458: - Summary: Interfaces and abstract classes are not valid types Key: FLINK-1458 URL: https://issues.apache.org/jira/browse/FLINK-1458 Project: Flink Issue Type: Bug Reporter: John Sandiford I don't know whether this is by design or is a bug, but I am having trouble working with DataSet and traits in scala which is a major limitation. A simple example is shown below. Compile time warning is 'Type Main.SimpleTrait has no fields that are visible from Scala Type analysis. Falling back to Java Type Analysis...' Run time error is 'Interfaces and abstract classes are not valid types: interface Main$SimpleTrait' Regards, John val env = ExecutionEnvironment.getExecutionEnvironment trait SimpleTrait { def contains(x: String): Boolean } class SimpleClass extends SimpleTrait { def contains(x: String) = true } val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) def f(data: DataSet[Double]): DataSet[SimpleTrait] = { data.mapPartition(iterator => { Iterator(new SimpleClass) }) } val g = f(data) g.print() -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1378) could not find implicit value for evidence parameter of type TypeInformation
[ https://issues.apache.org/jira/browse/FLINK-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293869#comment-14293869 ] John Sandiford commented on FLINK-1378: --- Yes, the fix works. Thank you. > could not find implicit value for evidence parameter of type TypeInformation > > > Key: FLINK-1378 > URL: https://issues.apache.org/jira/browse/FLINK-1378 > Project: Flink > Issue Type: Bug > Components: Scala API >Affects Versions: 0.7.0-incubating >Reporter: John Sandiford >Assignee: Aljoscha Krettek > > This is an example of one of many cases that I cannot get to compile with the > scala API. I have tried using T : TypeInformation and : ClassTag but still > cannot get it to work. > //libraryDependencies += "org.apache.flink" % "flink-scala" % > "0.7.0-incubating" > // > //libraryDependencies += "org.apache.flink" % "flink-clients" % > "0.7.0-incubating" > import org.apache.flink.api.scala._ > import scala.util.{Success, Try} > object Main extends App { > val env = ExecutionEnvironment.getExecutionEnvironment > val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0) > def f[T](data: DataSet[T]): DataSet[(T, Try[Seq[Double]])] = { > data.mapPartition((iterator: Iterator[T]) => { > val first = iterator.next() > val second = iterator.next() > Iterator((first, Success(Seq(2.0, 3.0))), (second, Success(Seq(3.0, > 1.0 > }) > } > val g = f(data) > g.print() > env.execute("Flink Test") > } -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Implement the convenience methods count and co...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/210#issuecomment-71689597 Looks like this is now ready to merge. @zentol I understand your concern. However, I think that it is much easier to execute in this way. Most of the times, the user probably wants just one accumulator result and not multiple. This is supposed to be a convenience function. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293833#comment-14293833 ] ASF GitHub Bot commented on FLINK-377: -- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71688094 @dan-blanchard generally, how easy it is will depend on what you're going for. you can implement the functionality to create a plan in the given language; or just leave that out and focus on udf's. (this means writing plans in java though!) for UDF's you can decide whether you want to create a complete framework with different operations and driver strategy's (map, cogroup, reduce etc.), or just provide the ability to receive/send values. the only common core is the data exchange between the given language and java, which for example in python takes roughly 300 lines of code. (data is itself is ~70, rest is serialization) > Create a general purpose framework for language bindings > > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement >Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71688094 @dan-blanchard generally, how easy it is will depend on what you're going for. you can implement the functionality to create a plan in the given language; or just leave that out and focus on udf's. (this means writing plans in java though!) for UDF's you can decide whether you want to create a complete framework with different operations and driver strategy's (map, cogroup, reduce etc.), or just provide the ability to receive/send values. the only common core is the data exchange between the given language and java, which for example in python takes roughly 300 lines of code. (data is itself is ~70, rest is serialization) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71672497 Whenever I look more closely at the DC I'm always left wondering how it can work at all. About your first point, i don't think thats enough. there is a more fundamental flaw, we need another counter for delete processes. consider the following 2 scenarios with 2 tasks distributing the same file. C denotes the creating of a copying process, D denotes deleting process. # denotes the count variable, O the oldCount variable. ``` 1): I II III IV T1:---CD T2:---CD--- # 1 22 2 O 2 2 2)I II III IV T1:---CD--- T2:---CD # 1 22 2 O 2 2 ``` In both scenarios, D at III should not delete the file, but all D's have the very same information. instead, i propose having 2 counters, one counting the # of copy operations; and one counting the # of delete operations, with the current value (at process creation) stored in the process. when executing, if the current value is equal to the copy count, files may be deleted, since this means that this delete process was the last to be started. let's make another fancy schema to illustrate the point: ``` 1): I II III IV T1:---CD T2:---CD--- # 1 22 2 O 1 2 2)I II III IV T1:---CD--- T2:---CD # 1 22 2 O 1 2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293723#comment-14293723 ] ASF GitHub Bot commented on FLINK-1419: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71672497 Whenever I look more closely at the DC I'm always left wondering how it can work at all. About your first point, i don't think thats enough. there is a more fundamental flaw, we need another counter for delete processes. consider the following 2 scenarios with 2 tasks distributing the same file. C denotes the creating of a copying process, D denotes deleting process. # denotes the count variable, O the oldCount variable. ``` 1): I II III IV T1:---CD T2:---CD--- # 1 22 2 O 2 2 2)I II III IV T1:---CD--- T2:---CD # 1 22 2 O 2 2 ``` In both scenarios, D at III should not delete the file, but all D's have the very same information. instead, i propose having 2 counters, one counting the # of copy operations; and one counting the # of delete operations, with the current value (at process creation) stored in the process. when executing, if the current value is equal to the copy count, files may be deleted, since this means that this delete process was the last to be started. let's make another fancy schema to illustrate the point: ``` 1): I II III IV T1:---CD T2:---CD--- # 1 22 2 O 1 2 2)I II III IV T1:---CD--- T2:---CD # 1 22 2 O 1 2 ``` > DistributedCache doesn't preserver files for subsequent operations > -- > > Key: FLINK-1419 > URL: https://issues.apache.org/jira/browse/FLINK-1419 > Project: Flink > Issue Type: Bug >Affects Versions: 0.8, 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > When subsequent operations want to access the same files in the DC it > frequently happens that the files are not created for the following operation. > This is fairly odd, since the DC is supposed to either a) preserve files when > another operation kicks in within a certain time window, or b) just recreate > the deleted files. Both things don't happen. > Increasing the time window had no effect. > I'd like to use this issue as a starting point for a more general discussion > about the DistributedCache. > Currently: > 1. all files reside in a common job-specific directory > 2. are deleted during the job. > > One thing that was brought up about Trait 1 is that it basically forbids > modification of the files, concurrent access and all. Personally I'm not sure > if this a problem. Changing it to a task-specific place solved the issue > though. > I'm more concerned about Trait #2. Besides the mentioned issue, the deletion > is realized with the scheduler, which adds a lot of complexity to the current > code. (It really is a pain to work on...) > If we moved the deletion to the end of the job it could be done as a clean-up > step in the TaskManager, With this we could reduce the DC to a > cacheFile(String source) method, the delete method in the TM, and throw out > everything else. > Also, the current implementation implies that big files may be copied > multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293612#comment-14293612 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23611202 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- Oh yes, you're right. Then it's perfectly fine. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23611202 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- Oh yes, you're right. Then it's perfectly fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/345#issuecomment-71656804 Good to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1457) RAT check fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293610#comment-14293610 ] ASF GitHub Bot commented on FLINK-1457: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/345#issuecomment-71656804 Good to merge > RAT check fails on Windows > -- > > Key: FLINK-1457 > URL: https://issues.apache.org/jira/browse/FLINK-1457 > Project: Flink > Issue Type: Bug >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Trivial > > On (my) Windows 7 (Maven 3.2.2), the RAT check fails as > flink-addons/flink-avro/src/test/resources/testdata.avro has no approved > license. Not being an actual code file, it should be excluded from the RAT > check so that verification also passes on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1457] exclude avro test file from RAT c...
GitHub user sekruse opened a pull request: https://github.com/apache/flink/pull/345 [FLINK-1457] exclude avro test file from RAT check You can merge this pull request into a Git repository by running: $ git pull https://github.com/sekruse/flink FLINK-1475 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit 1a1adcc5c9d372d22c50167525aaa0f1d3eb4d84 Author: Sebastian Kruse Date: 2015-01-27T14:24:01Z [FLINK-1457] exclude avro test file from RAT check --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1457) RAT check fails on Windows
[ https://issues.apache.org/jira/browse/FLINK-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293590#comment-14293590 ] ASF GitHub Bot commented on FLINK-1457: --- GitHub user sekruse opened a pull request: https://github.com/apache/flink/pull/345 [FLINK-1457] exclude avro test file from RAT check You can merge this pull request into a Git repository by running: $ git pull https://github.com/sekruse/flink FLINK-1475 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit 1a1adcc5c9d372d22c50167525aaa0f1d3eb4d84 Author: Sebastian Kruse Date: 2015-01-27T14:24:01Z [FLINK-1457] exclude avro test file from RAT check > RAT check fails on Windows > -- > > Key: FLINK-1457 > URL: https://issues.apache.org/jira/browse/FLINK-1457 > Project: Flink > Issue Type: Bug >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Trivial > > On (my) Windows 7 (Maven 3.2.2), the RAT check fails as > flink-addons/flink-avro/src/test/resources/testdata.avro has no approved > license. Not being an actual code file, it should be excluded from the RAT > check so that verification also passes on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293588#comment-14293588 ] ASF GitHub Bot commented on FLINK-938: -- Github user qmlmoon commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23610509 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- In local mode, jobmanager will start an internal taskmanager and call the `TaskManager.parseConfiguration`. For TaskManager, we can replace the null default value with the `config.defaultJobManagerAdd` but not the `InetAddress.getLocalHost.getHostName`. I think the question is whether we add an additional parameter to the `parseConfiguration` for the default jobmanager address or put this in the configuration. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user qmlmoon commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23610509 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- In local mode, jobmanager will start an internal taskmanager and call the `TaskManager.parseConfiguration`. For TaskManager, we can replace the null default value with the `config.defaultJobManagerAdd` but not the `InetAddress.getLocalHost.getHostName`. I think the question is whether we add an additional parameter to the `parseConfiguration` for the default jobmanager address or put this in the configuration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71654749 both good points. I'll address them after lunch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1457) RAT check fails on Windows
Sebastian Kruse created FLINK-1457: -- Summary: RAT check fails on Windows Key: FLINK-1457 URL: https://issues.apache.org/jira/browse/FLINK-1457 Project: Flink Issue Type: Bug Reporter: Sebastian Kruse Assignee: Sebastian Kruse Priority: Trivial On (my) Windows 7 (Maven 3.2.2), the RAT check fails as flink-addons/flink-avro/src/test/resources/testdata.avro has no approved license. Not being an actual code file, it should be excluded from the RAT check so that verification also passes on Windows. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293584#comment-14293584 ] ASF GitHub Bot commented on FLINK-1419: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71654749 both good points. I'll address them after lunch! > DistributedCache doesn't preserver files for subsequent operations > -- > > Key: FLINK-1419 > URL: https://issues.apache.org/jira/browse/FLINK-1419 > Project: Flink > Issue Type: Bug >Affects Versions: 0.8, 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > When subsequent operations want to access the same files in the DC it > frequently happens that the files are not created for the following operation. > This is fairly odd, since the DC is supposed to either a) preserve files when > another operation kicks in within a certain time window, or b) just recreate > the deleted files. Both things don't happen. > Increasing the time window had no effect. > I'd like to use this issue as a starting point for a more general discussion > about the DistributedCache. > Currently: > 1. all files reside in a common job-specific directory > 2. are deleted during the job. > > One thing that was brought up about Trait 1 is that it basically forbids > modification of the files, concurrent access and all. Personally I'm not sure > if this a problem. Changing it to a task-specific place solved the issue > though. > I'm more concerned about Trait #2. Besides the mentioned issue, the deletion > is realized with the scheduler, which adds a lot of complexity to the current > code. (It really is a pain to work on...) > If we moved the deletion to the end of the job it could be done as a clean-up > step in the TaskManager, With this we could reduce the DC to a > cacheFile(String source) method, the delete method in the TM, and throw out > everything else. > Also, the current implementation implies that big files may be copied > multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293561#comment-14293561 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23609808 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- That is true for the TaskManager but not for the JobManager. For the TaskManager, we could replace the null default value with ```InetAddress.getLocalHost.getHostName``` at both places. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23609808 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- That is true for the TaskManager but not for the JobManager. For the TaskManager, we could replace the null default value with ```InetAddress.getLocalHost.getHostName``` at both places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293556#comment-14293556 ] ASF GitHub Bot commented on FLINK-1419: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71652771 I'm wondering whether the count hash map update should rather happen in the copy process. Because otherwise there could be the following interleaving: 1. You register a new temp file "foobar" for task B --> creating a copy task and increment file counter 2. You delete the temp file "foobar" for task A because it is finished --> creating a delete process with the incremented counter 3. You execute the copy process 4. You execute the delete process Then the file "foobar" does not exist for task B. Another thing is that the DeleteProcess tries to delete the whole directory below the jobID if one file shall be deleted. I don't know whether this is the right behaviour. > DistributedCache doesn't preserver files for subsequent operations > -- > > Key: FLINK-1419 > URL: https://issues.apache.org/jira/browse/FLINK-1419 > Project: Flink > Issue Type: Bug >Affects Versions: 0.8, 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > When subsequent operations want to access the same files in the DC it > frequently happens that the files are not created for the following operation. > This is fairly odd, since the DC is supposed to either a) preserve files when > another operation kicks in within a certain time window, or b) just recreate > the deleted files. Both things don't happen. > Increasing the time window had no effect. > I'd like to use this issue as a starting point for a more general discussion > about the DistributedCache. > Currently: > 1. all files reside in a common job-specific directory > 2. are deleted during the job. > > One thing that was brought up about Trait 1 is that it basically forbids > modification of the files, concurrent access and all. Personally I'm not sure > if this a problem. Changing it to a task-specific place solved the issue > though. > I'm more concerned about Trait #2. Besides the mentioned issue, the deletion > is realized with the scheduler, which adds a lot of complexity to the current > code. (It really is a pain to work on...) > If we moved the deletion to the end of the job it could be done as a clean-up > step in the TaskManager, With this we could reduce the DC to a > cacheFile(String source) method, the delete method in the TM, and throw out > everything else. > Also, the current implementation implies that big files may be copied > multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71652771 I'm wondering whether the count hash map update should rather happen in the copy process. Because otherwise there could be the following interleaving: 1. You register a new temp file "foobar" for task B --> creating a copy task and increment file counter 2. You delete the temp file "foobar" for task A because it is finished --> creating a delete process with the incremented counter 3. You execute the copy process 4. You execute the delete process Then the file "foobar" does not exist for task B. Another thing is that the DeleteProcess tries to delete the whole directory below the jobID if one file shall be deleted. I don't know whether this is the right behaviour. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Closed] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-1352. Resolution: Fixed Fixed in 730e056a2a2ea028495637b633396392c31337e3 > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293536#comment-14293536 ] ASF GitHub Bot commented on FLINK-1352: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/328 > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/328 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink-1436] refactor CLiFrontend to provide m...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/331#issuecomment-71649409 Thanks for your feedback, @rmetzger . The error message is at the bottom because that way it is most easily identifiable by the user (no scrolling necessary). Before, we printed the error and then the help which let the help shadow the error message. I changed the error reporting in case the user didn't specify an action. Concerning the printing of the help message, you're probably right. Let's just print the help if the user asks for it. Now it prints: > "./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar" is not a valid action. > Valid actions are "run", "list", "info", or "cancel". Additionally, let's * change `info` to print the plan by default * change `cancel` to accept the job id as a parameter instead of an option * change `list` to print scheduled and running jobs by default --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293527#comment-14293527 ] ASF GitHub Bot commented on FLINK-938: -- Github user qmlmoon commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607815 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- It was my first thought. But there is another part in `TaskManager.parseConfiguration` method that wants to get jobmanager address from configuration > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user qmlmoon commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607815 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- It was my first thought. But there is another part in `TaskManager.parseConfiguration` method that wants to get jobmanager address from configuration --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293525#comment-14293525 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607703 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -596,6 +596,10 @@ object TaskManager { opt[String]("tempDir") optional() action { (x, c) => c.copy(tmpDir = x) } text ("Specify temporary directory.") + + opt[String]("defaultJobManagerAdd") optional() action { (x, c) => --- End diff -- I'm slightly in favour of writing the option completely out: defaultJobManagerAddress > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607703 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -596,6 +596,10 @@ object TaskManager { opt[String]("tempDir") optional() action { (x, c) => c.copy(tmpDir = x) } text ("Specify temporary directory.") + + opt[String]("defaultJobManagerAdd") optional() action { (x, c) => --- End diff -- I'm slightly in favour of writing the option completely out: defaultJobManagerAddress --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293523#comment-14293523 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607635 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -611,6 +615,13 @@ object TaskManager { configuration.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, config.tmpDir) } +if (config.defaultJobManagerAdd != null && GlobalConfiguration.getString(ConfigConstants + .JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +config.defaultJobManagerAdd) +} + val jobManagerHostname = configuration.getString(ConfigConstants .JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- The same question as above. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607635 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -611,6 +615,13 @@ object TaskManager { configuration.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, config.tmpDir) } +if (config.defaultJobManagerAdd != null && GlobalConfiguration.getString(ConfigConstants + .JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +config.defaultJobManagerAdd) +} + val jobManagerHostname = configuration.getString(ConfigConstants .JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- The same question as above. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293521#comment-14293521 ] ASF GitHub Bot commented on FLINK-377: -- Github user dan-blanchard commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71648291 @rmetzger I'm really more curious than anything at this point. I recently worked on a fairly large Storm topology that has parts written Java, Python, and Perl. As part of that I ended up taking over as the maintainer of IO::Storm, the Perl library for interfacing with Storm via their [Multilang protocol](https://storm.apache.org/documentation/Multilang-protocol.html). Multilang makes it incredibly easy to add support for other languages, so I just wanted to know if you guys were going for something that simple or not. > Create a general purpose framework for language bindings > > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement >Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607471 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- Can't we just set InetAddress.getLocalHost.getHostName as the default value here instead of having the if block above? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user dan-blanchard commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71648291 @rmetzger I'm really more curious than anything at this point. I recently worked on a fairly large Storm topology that has parts written Java, Python, and Perl. As part of that I ended up taking over as the maintainer of IO::Storm, the Perl library for interfacing with Storm via their [Multilang protocol](https://storm.apache.org/documentation/Multilang-protocol.html). Multilang makes it incredibly easy to add support for other languages, so I just wanted to know if you guys were going for something that simple or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293520#comment-14293520 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/248#discussion_r23607471 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -519,6 +519,12 @@ object JobManager { configuration.setString(ConfigConstants.FLINK_BASE_DIR_PATH_KEY, config.configDir + "/..") } +if (GlobalConfiguration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, + null) == null) { + configuration.setString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, +InetAddress.getLocalHost.getHostName) +} + val hostname = configuration.getString(ConfigConstants.JOB_MANAGER_IPC_ADDRESS_KEY, null) --- End diff -- Can't we just set InetAddress.getLocalHost.getHostName as the default value here instead of having the if block above? > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293508#comment-14293508 ] ASF GitHub Bot commented on FLINK-938: -- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/248#issuecomment-71647940 Great, thanks. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/248#issuecomment-71647940 Great, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1352) Buggy registration from TaskManager to JobManager
[ https://issues.apache.org/jira/browse/FLINK-1352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293502#comment-14293502 ] ASF GitHub Bot commented on FLINK-1352: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71647585 I'll merge it. > Buggy registration from TaskManager to JobManager > - > > Key: FLINK-1352 > URL: https://issues.apache.org/jira/browse/FLINK-1352 > Project: Flink > Issue Type: Bug > Components: JobManager, TaskManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Till Rohrmann > Fix For: 0.9 > > > The JobManager's InstanceManager may refuse the registration attempt from a > TaskManager, because it has this taskmanager already connected, or,in the > future, because the TaskManager has been blacklisted as unreliable. > Unpon refused registration, the instance ID is null, to signal that refused > registration. TaskManager reacts incorrectly to such methods, assuming > successful registration > Possible solution: JobManager sends back a dedicated "RegistrationRefused" > message, if the instance manager returns null as the registration result. If > the TastManager receives that before being registered, it knows that the > registration response was lost (which should not happen on TCP and it would > indicate a corrupt connection) > Followup question: Does it make sense to have the TaskManager trying > indefinitely to connect to the JobManager. With increasing interval (from > seconds to minutes)? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1352] [runtime] Fix buggy registration ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/328#issuecomment-71647585 I'll merge it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address
[ https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293481#comment-14293481 ] ASF GitHub Bot commented on FLINK-938: -- Github user qmlmoon commented on the pull request: https://github.com/apache/flink/pull/248#issuecomment-71645508 ok. I rebased the PR and modified it with scala implementation. > Change start-cluster.sh script so that users don't have to configure the > JobManager address > --- > > Key: FLINK-938 > URL: https://issues.apache.org/jira/browse/FLINK-938 > Project: Flink > Issue Type: Improvement > Components: Build System >Reporter: Robert Metzger >Assignee: Mingliang Qi >Priority: Minor > Fix For: 0.9 > > > To improve the user experience, Flink should not require users to configure > the JobManager's address on a cluster. > In combination with FLINK-934, this would allow running Flink with decent > performance on a cluster without setting a single configuration value. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...
Github user qmlmoon commented on the pull request: https://github.com/apache/flink/pull/248#issuecomment-71645508 ok. I rebased the PR and modified it with scala implementation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293458#comment-14293458 ] Aljoscha Krettek commented on FLINK-1396: - For this to work I have to move the Hadoop Formats from addons to the java (resp. scala) packages. Should I move the input formats and leave the rest intact, or should I duplicate the input formats and leave the addons package as it is? Also, I should probably add direct methods for both the old and the new API. What are your thoughts on this? > Add hadoop input formats directly to the user API. > -- > > Key: FLINK-1396 > URL: https://issues.apache.org/jira/browse/FLINK-1396 > Project: Flink > Issue Type: Bug >Reporter: Robert Metzger >Assignee: Aljoscha Krettek > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1303) HadoopInputFormat does not work with Scala API
[ https://issues.apache.org/jira/browse/FLINK-1303?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-1303: Issue Type: Sub-task (was: Bug) Parent: FLINK-1396 > HadoopInputFormat does not work with Scala API > -- > > Key: FLINK-1303 > URL: https://issues.apache.org/jira/browse/FLINK-1303 > Project: Flink > Issue Type: Sub-task > Components: Scala API >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > Fix For: 0.9 > > > It fails because the HadoopInputFormat uses the Flink Tuple2 type. For this, > type extraction fails at runtime. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1419) DistributedCache doesn't preserver files for subsequent operations
[ https://issues.apache.org/jira/browse/FLINK-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293416#comment-14293416 ] ASF GitHub Bot commented on FLINK-1419: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71638812 updated to include discussed changes > DistributedCache doesn't preserver files for subsequent operations > -- > > Key: FLINK-1419 > URL: https://issues.apache.org/jira/browse/FLINK-1419 > Project: Flink > Issue Type: Bug >Affects Versions: 0.8, 0.9 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler > > When subsequent operations want to access the same files in the DC it > frequently happens that the files are not created for the following operation. > This is fairly odd, since the DC is supposed to either a) preserve files when > another operation kicks in within a certain time window, or b) just recreate > the deleted files. Both things don't happen. > Increasing the time window had no effect. > I'd like to use this issue as a starting point for a more general discussion > about the DistributedCache. > Currently: > 1. all files reside in a common job-specific directory > 2. are deleted during the job. > > One thing that was brought up about Trait 1 is that it basically forbids > modification of the files, concurrent access and all. Personally I'm not sure > if this a problem. Changing it to a task-specific place solved the issue > though. > I'm more concerned about Trait #2. Besides the mentioned issue, the deletion > is realized with the scheduler, which adds a lot of complexity to the current > code. (It really is a pain to work on...) > If we moved the deletion to the end of the job it could be done as a clean-up > step in the TaskManager, With this we could reduce the DC to a > cacheFile(String source) method, the delete method in the TM, and throw out > everything else. > Also, the current implementation implies that big files may be copied > multiple times. This may be undesired, depending on how big the files are. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1419] [runtime] DC properly synchronize...
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/339#issuecomment-71638812 updated to include discussed changes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1330] [build] Build creates a link in t...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71633168 So either we can somehow change the execution order of the clean goal or we fix this in the junction plugin. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1330) Restructure directory layout
[ https://issues.apache.org/jira/browse/FLINK-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293390#comment-14293390 ] ASF GitHub Bot commented on FLINK-1330: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/333#issuecomment-71633168 So either we can somehow change the execution order of the clean goal or we fix this in the junction plugin. > Restructure directory layout > > > Key: FLINK-1330 > URL: https://issues.apache.org/jira/browse/FLINK-1330 > Project: Flink > Issue Type: Improvement > Components: Build System, Documentation >Reporter: Max Michels >Priority: Minor > Labels: usability > > When building Flink, the build results can currently be found under > "flink-root/flink-dist/target/flink-$FLINKVERSION-incubating-SNAPSHOT-bin/flink-$YARNVERSION-$FLINKVERSION-incubating-SNAPSHOT/". > I think we could improve the directory layout with the following: > - provide the bin folder in the root by default > - let the start up and submissions scripts in bin assemble the class path > - in case the project hasn't been build yet, inform the user > The changes would make it easier to work with Flink from source. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1433. --- Resolution: Fixed Fix Version/s: 0.9 Assignee: Robert Metzger Fixed for 0.9 in http://git-wip-us.apache.org/repos/asf/flink/commit/a5150a90 Fixed for 0.8.1 in http://git-wip-us.apache.org/repos/asf/flink/commit/2387a08e > Add HADOOP_CLASSPATH to start scripts > - > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger >Assignee: Robert Metzger > Fix For: 0.9, 0.8.1 > > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293379#comment-14293379 ] ASF GitHub Bot commented on FLINK-1318: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/265#issuecomment-71631281 Looks good to me. How about adding some documentation for this feature? Maybe under http://flink.apache.org/docs/0.8/programming_guide.html#data-sources > Make quoted String parsing optional and configurable for CSVInputFormats > > > Key: FLINK-1318 > URL: https://issues.apache.org/jira/browse/FLINK-1318 > Project: Flink > Issue Type: Improvement > Components: Java API, Scala API >Affects Versions: 0.8 >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Minor > > With the current implementation of the CSVInputFormat, quoted string parsing > kicks in, if the first non-whitespace character of a field is a double quote. > I see two issues with this implementation: > 1. Quoted String parsing cannot be disabled > 2. The quoting character is fixed to double quotes (") > I propose to add parameters to disable quoted String parsing and set the > quote character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/265#issuecomment-71631281 Looks good to me. How about adding some documentation for this feature? Maybe under http://flink.apache.org/docs/0.8/programming_guide.html#data-sources --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293377#comment-14293377 ] ASF GitHub Bot commented on FLINK-1433: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/337 > Add HADOOP_CLASSPATH to start scripts > - > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 0.8.1 > > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/337 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293375#comment-14293375 ] ASF GitHub Bot commented on FLINK-1433: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71631049 Merging it. > Add HADOOP_CLASSPATH to start scripts > - > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 0.8.1 > > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71631049 Merging it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1433) Add HADOOP_CLASSPATH to start scripts
[ https://issues.apache.org/jira/browse/FLINK-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293267#comment-14293267 ] ASF GitHub Bot commented on FLINK-1433: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71618846 Looks good to merge. Like Robert said, the `HADOOP_CLASSPATH` is used to add third party libraries. From `hadoop-env.sh`: # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done > Add HADOOP_CLASSPATH to start scripts > - > > Key: FLINK-1433 > URL: https://issues.apache.org/jira/browse/FLINK-1433 > Project: Flink > Issue Type: Improvement >Reporter: Robert Metzger > Fix For: 0.8.1 > > > With the Hadoop file system wrapper, its important to have access to the > hadoop filesystem classes. > The HADOOP_CLASSPATH seems to be a standard environment variable used by > Hadoop for such libraries. > Deployments like Google Compute Cloud set this variable containing the > "Google Cloud Storage Hadoop Wrapper". So if users want to use the Cloud > Storage in an non-yarn environment, we need to address this issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1433] Add HADOOP_CLASSPATH to start scr...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/337#issuecomment-71618846 Looks good to merge. Like Robert said, the `HADOOP_CLASSPATH` is used to add third party libraries. From `hadoop-env.sh`: # Extra Java CLASSPATH elements. Automatically insert capacity-scheduler. for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do if [ "$HADOOP_CLASSPATH" ]; then export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f else export HADOOP_CLASSPATH=$f fi done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293258#comment-14293258 ] ASF GitHub Bot commented on FLINK-1442: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/344#issuecomment-71616952 @StephanEwen Yes, that was on purpose. The previous two data structures (`HashMap` and `Queue`) are now replaced by the `LinkedHashMap` which serves the same functionality. It might not be obvious but the `LinkedHashMap` preserves the order of the inserted items. From `scala.collection.mutable.LinkedHashMap`: > This class implements mutable maps using a hashtable. > The iterator and all traversal methods of this class visit elements in the order they were inserted. That's why `graphs.iterator.next()` always returns the least recently inserted item. > Archived Execution Graph consumes too much memory > - > > Key: FLINK-1442 > URL: https://issues.apache.org/jira/browse/FLINK-1442 > Project: Flink > Issue Type: Bug > Components: JobManager >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Max Michels > > The JobManager archives the execution graphs, for analysis of jobs. The > graphs may consume a lot of memory. > Especially the execution edges in all2all connection patterns are extremely > many and add up in memory consumption. > The execution edges connect all parallel tasks. So for a all2all pattern > between n and m tasks, there are n*m edges. For parallelism of multiple 100 > tasks, this can easily reach 100k objects and more, each with a set of > metadata. > I propose the following to solve that: > 1. Clear all execution edges from the graph (majority of the memory > consumers) when it is given to the archiver. > 2. Have the map/list of the archived graphs behind a soft reference, to it > will be removed under memory pressure before the JVM crashes. That may remove > graphs from the history early, but is much preferable to the JVM crashing, in > which case the graph is lost as well... > 3. Long term: The graph should be archived somewhere else. Somthing like the > History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/344#issuecomment-71616952 @StephanEwen Yes, that was on purpose. The previous two data structures (`HashMap` and `Queue`) are now replaced by the `LinkedHashMap` which serves the same functionality. It might not be obvious but the `LinkedHashMap` preserves the order of the inserted items. From `scala.collection.mutable.LinkedHashMap`: > This class implements mutable maps using a hashtable. > The iterator and all traversal methods of this class visit elements in the order they were inserted. That's why `graphs.iterator.next()` always returns the least recently inserted item. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1452) Add "flink-contrib" maven module and README.md with the rules
[ https://issues.apache.org/jira/browse/FLINK-1452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293240#comment-14293240 ] Robert Metzger commented on FLINK-1452: --- In my opinion: {{flink-addons}}: - are usually part of {{flink-dist}} - are maintained by the core committers - are documented in the main documentation - are eventually moved out of "flink-addons" once they are "stable" (for example {{flink-yarn}} recently, and probably {{flink-streaming}} soon) {{flink-contrib}}: - Not part of {{flink-dist}} - documentation should live somewhere in the code, not in the main repo We could call {{flink-contrib}} --> {{flink-user-contrib}} .. but that would make the name pretty long Also, we could name {{flink-addons}} --> {{flink-unstable}}, but that's probably a bad name ;) I see the {{flink-addons}} as our internal incubator. Other potential names could be {{flink-beta}}, {{flink-incubator}} > Add "flink-contrib" maven module and README.md with the rules > - > > Key: FLINK-1452 > URL: https://issues.apache.org/jira/browse/FLINK-1452 > Project: Flink > Issue Type: New Feature > Components: flink-contrib >Reporter: Robert Metzger >Assignee: Robert Metzger > > I'll also create a JIRA component -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1328] Reworked semantic annotations
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/311#issuecomment-71613018 +1 for merging it Whats the plan with the documentation? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1328) Rework Constant Field Annotations
[ https://issues.apache.org/jira/browse/FLINK-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293227#comment-14293227 ] ASF GitHub Bot commented on FLINK-1328: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/311#issuecomment-71613018 +1 for merging it Whats the plan with the documentation? > Rework Constant Field Annotations > - > > Key: FLINK-1328 > URL: https://issues.apache.org/jira/browse/FLINK-1328 > Project: Flink > Issue Type: Improvement > Components: Java API, Optimizer, Scala API >Affects Versions: 0.7.0-incubating >Reporter: Fabian Hueske >Assignee: Fabian Hueske > > Constant field annotations are used by the optimizer to determine whether > physical data properties such as sorting or partitioning are retained by user > defined functions. > The current implementation is limited and can be extended in several ways: > - Fields that are copied to other positions > - Field definitions for non-tuple data types (Pojos) > There is a pull request (#83) that goes into this direction and which can be > extended. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-377) Create a general purpose framework for language bindings
[ https://issues.apache.org/jira/browse/FLINK-377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293223#comment-14293223 ] ASF GitHub Bot commented on FLINK-377: -- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71612618 @dan-blanchard What non-JVM language are you looking for? Maybe we can do a little prototype with that language to see how well it works. Maybe you or somebody else from the community is interested in making the prototype production ready? > Create a general purpose framework for language bindings > > > Key: FLINK-377 > URL: https://issues.apache.org/jira/browse/FLINK-377 > Project: Flink > Issue Type: Improvement >Reporter: GitHub Import > Labels: github-import > Fix For: pre-apache > > > A general purpose API to run operators with arbitrary binaries. > This will allow to run Stratosphere programs written in Python, JavaScript, > Ruby, Go or whatever you like. > We suggest using Google Protocol Buffers for data serialization. This is the > list of languages that currently support ProtoBuf: > https://code.google.com/p/protobuf/wiki/ThirdPartyAddOns > Very early prototype with python: > https://github.com/rmetzger/scratch/tree/learn-protobuf (basically testing > protobuf) > For Ruby: https://github.com/infochimps-labs/wukong > Two new students working at Stratosphere (@skunert and @filiphaase) are > working on this. > The reference binding language will be for Python, but other bindings are > very welcome. > The best name for this so far is "stratosphere-lang-bindings". > I created this issue to track the progress (and give everybody a chance to > comment on this) > Imported from GitHub > Url: https://github.com/stratosphere/stratosphere/issues/377 > Created by: [rmetzger|https://github.com/rmetzger] > Labels: enhancement, > Assignee: [filiphaase|https://github.com/filiphaase] > Created at: Tue Jan 07 19:47:20 CET 2014 > State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1105) Add support for locally sorted output
[ https://issues.apache.org/jira/browse/FLINK-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293225#comment-14293225 ] Stephan Ewen commented on FLINK-1105: - Adding to my comment above: That would mean that we make sorting not just a property of the sink, but an operator of its own. For any efficiency, this operator would need to be fused with successors > Add support for locally sorted output > - > > Key: FLINK-1105 > URL: https://issues.apache.org/jira/browse/FLINK-1105 > Project: Flink > Issue Type: Sub-task > Components: Java API >Reporter: Fabian Hueske >Assignee: Fabian Hueske >Priority: Minor > > This feature will make it possible to sort the output which is sent to an > OutputFormat to obtain a locally sorted result. > This feature was available in the "old" Java API and has not be ported to the > new Java API yet. Hence optimizer and runtime should already have support for > this feature. However, the API and job generation part is missing. > It is also a subfeature of FLINK-598 which will provide also globally sorted > results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-377] [FLINK-671] Generic Interface / PA...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/202#issuecomment-71612618 @dan-blanchard What non-JVM language are you looking for? Maybe we can do a little prototype with that language to see how well it works. Maybe you or somebody else from the community is interested in making the prototype production ready? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---