[jira] [Created] (FLINK-3290) [py] Generalize OperationInfo transfer

2016-01-26 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-3290:
---

 Summary: [py] Generalize OperationInfo transfer
 Key: FLINK-3290
 URL: https://issues.apache.org/jira/browse/FLINK-3290
 Project: Flink
  Issue Type: Bug
  Components: Python API
Affects Versions: 0.10.1
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.00


A set number of arguments is transferred whenever a user defines an operation. 
For a CSV Source for example these are delimiters/filepath, for a map function 
only the set ID'S are transferred. As such, for all operators a separate 
routine is defined that governs which arguments are transferred.

While working on FLINK-3275 I realized that adding a new argument/parameter, in 
this case parallelism, is not as straightforward as it could be. Most newly 
added operators will require a new routine; whereas adding new arguments may 
require the modification of multiple routines. Over times, this is bound to 
become a big mess.

All arguments are stored in an OperationInfo object, which also contains 
default values for all unused arguments. I want to generalize the whole affair 
by transferring all arguments, used or not. 

This will reduce clutter, make it easier to add new parameters (only 4 new 
lines needed, 2 for defining new fields inside Java/Python OperationInfo 
Classes; 1 each for sending/receiving the new argument) and will make the 
transfer consistent across all operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3291) Object reuse bug in MergeIterator.HeadStream.nextHead

2016-01-26 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-3291:
--

 Summary: Object reuse bug in MergeIterator.HeadStream.nextHead
 Key: FLINK-3291
 URL: https://issues.apache.org/jira/browse/FLINK-3291
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Gabor Gevay
Assignee: Gabor Gevay
Priority: Critical


MergeIterator.HeadStream.nextHead saves a reference into `this.head` of the 
`reuse` object that it got as an argument. This object might be modified later 
by the caller.

This actually happens when ReduceDriver.run calls input.next (which will 
actually be MergeIterator.next(E reuse)) in the inner while loop of the 
objectReuseEnabled branch, and that calls top.nextHead with the reference that 
it got from ReduceDriver, which erroneously saves the reference, and then 
ReduceDriver later uses that same object for doing the reduce.

Another way in which this fails is when MergeIterator.next(E reuse) gives 
`reuse` to different `top`s in different calls, and then the heads end up being 
the same object.

You can observe the latter situation in action by running ReducePerformance 
here:
https://github.com/ggevay/flink/tree/merge-iterator-object-reuse-bug
Set memory to -Xmx200m (so that the MergeIterator actually has merging to do), 
put a breakpoint at the beginning of MergeIterator.next(reuse), and then watch 
`reuse`, and the heads of the first two elements of `this.heap` in the 
debugger. They will get to be the same object after hitting continue about 6 
times.

You can also look at the count that is printed at the end, which shouldn't be 
larger than the key range. Also, if you look into the output file 
/tmp/xxxobjectreusebug, for example the key 77 appears twice.

The good news is that I think I can see an easy fix that doesn't affect 
performance: MergeIterator.HeadStream could have a reuse object of its own as a 
member, and give that to iterator.next in nextHead(E reuse). And then we 
wouldn't need the overload of nextHead that has the reuse parameter, and 
MergeIterator.next(E reuse) could just call its other overload.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3292) Bug in flink-jdbc. Not all JDBC drivers supported

2016-01-26 Thread Subhobrata Dey (JIRA)
Subhobrata Dey created FLINK-3292:
-

 Summary: Bug in flink-jdbc. Not all JDBC drivers supported
 Key: FLINK-3292
 URL: https://issues.apache.org/jira/browse/FLINK-3292
 Project: Flink
  Issue Type: Bug
  Components: other
Affects Versions: 1.0.0
Reporter: Subhobrata Dey
Priority: Minor
 Fix For: 1.0.0


Hello,

In method open in JDBCInputFormat.java, while using dbConn.createStatement, the 
resultSetType & resultSetConcurrency are hardcoded. 
These two fields may vary with different JDBC drivers & hence it fails in a few 
cases like SAP HANA Jdbc driver. 
There are two variants of the method dbCon.createStatement, one with parameters 
& the other without  parameters. Both should be supported. 

Thanks & regards,
Subhobrata



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3293) Custom Application Name on YARN is ignored in deploy jobmanager mode

2016-01-26 Thread Johannes (JIRA)
Johannes created FLINK-3293:
---

 Summary: Custom Application Name on YARN is ignored in deploy 
jobmanager mode
 Key: FLINK-3293
 URL: https://issues.apache.org/jira/browse/FLINK-3293
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.10.1
Reporter: Johannes
Priority: Minor


FLINK-2298 introduced a custom name for the job.

This is ignored when the yarn application is started as part of the job 
submission, e.g.

   flink run -m yarn-cluster -ynm myname

It is always set using the classname as program name

flinkYarnClient.setName("Flink Application: " + programName);

The client get's constructed using

   AbstractFlinkYarnClient flinkYarnClient = 
CliFrontendParser.getFlinkYarnSessionCli().createFlinkYarnClient(commandLine);

So the name will be parsed correctly, it is just overwritten.
This should be a fallback, when no name is provided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)