[jira] [Created] (FLINK-4586) NumberSequenceIterator and Accumulator threading issue
Johannes created FLINK-4586: --- Summary: NumberSequenceIterator and Accumulator threading issue Key: FLINK-4586 URL: https://issues.apache.org/jira/browse/FLINK-4586 Project: Flink Issue Type: Bug Components: DataSet API Affects Versions: 1.1.2 Reporter: Johannes Priority: Minor There is a strange problem when using the NumberSequenceIterator in combination with an AverageAccumulator. It seems like the individual accumulators are reinitialized and overwrite parts of intermediate solution. The following scala snippit exemplifies the problem. Instead of printing the correct average, the result should be {{50.5}} but is something completely different, like {{8.08}}, dependent on the number of cores used. If the parallelism is set to {{1}} the result is correct, which seems like there is a problem with threading. The problem occurs using the java and scala API. {code} env .fromParallelCollection(new NumberSequenceIterator(1, 100)) .map(new RichMapFunction[Long, Long] { var a : AverageAccumulator = _ override def map(value: Long): Long = { a.add(value) value } override def open(parameters: Configuration): Unit = { a = new AverageAccumulator getRuntimeContext.addAccumulator("test", a) } }) .reduce((a, b) => a + b) .print() val lastJobExecutionResult: JobExecutionResult = env.getLastJobExecutionResult println(lastJobExecutionResult.getAccumulatorResult("test")) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3293) Custom Application Name on YARN is ignored in deploy jobmanager mode
Johannes created FLINK-3293: --- Summary: Custom Application Name on YARN is ignored in deploy jobmanager mode Key: FLINK-3293 URL: https://issues.apache.org/jira/browse/FLINK-3293 Project: Flink Issue Type: Bug Components: YARN Client Affects Versions: 0.10.1 Reporter: Johannes Priority: Minor FLINK-2298 introduced a custom name for the job. This is ignored when the yarn application is started as part of the job submission, e.g. flink run -m yarn-cluster -ynm myname It is always set using the classname as program name flinkYarnClient.setName("Flink Application: " + programName); The client get's constructed using AbstractFlinkYarnClient flinkYarnClient = CliFrontendParser.getFlinkYarnSessionCli().createFlinkYarnClient(commandLine); So the name will be parsed correctly, it is just overwritten. This should be a fallback, when no name is provided -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3090) Create a Parcel Distribution for Cloudera Manager
Johannes created FLINK-3090: --- Summary: Create a Parcel Distribution for Cloudera Manager Key: FLINK-3090 URL: https://issues.apache.org/jira/browse/FLINK-3090 Project: Flink Issue Type: Improvement Components: release Affects Versions: 0.10.1 Reporter: Johannes Priority: Minor For ease of deployment it would be nice to provide a parcel distribution of Flink which can be easily managed using Clouder Manager. This would set up all necessary dependencies on all machine and starts Flink jobs using yarn. A good description of how to get started can be found in the [Cloudera Manager Extensibility Tools and Documentation | https://github.com/cloudera/cm_ext] What needs to be done * Create a service description * Create a parcel containing the release files * Create scripts that can write the appropriate configuration files, taking values from the cloudera manager web config For reference [a collection on how this is configured for other services|https://github.com/cloudera/cm_csds], such as Spark. Some [starter code | https://github.com/jkirsch/cmflink] can be found here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2984) Support lenient parsing of SVMLight input files
Johannes created FLINK-2984: --- Summary: Support lenient parsing of SVMLight input files Key: FLINK-2984 URL: https://issues.apache.org/jira/browse/FLINK-2984 Project: Flink Issue Type: Improvement Components: Machine Learning Library Affects Versions: 0.9.1 Reporter: Johannes Priority: Trivial The current implementation for the reader assumes that the format follows the exact specification. The [splice-site Dataset| https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#splice-site] dataset is formatted slightly different Example {noformat} -1 1:0.381846 2:0.163648 3:0.245472 4:0.627318 {noformat} note the two spaces after the label. Currently MLUtils.scala splits on single spaces. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1596) FileIOChannel introduces space in temp file name
Johannes created FLINK-1596: --- Summary: FileIOChannel introduces space in temp file name Key: FLINK-1596 URL: https://issues.apache.org/jira/browse/FLINK-1596 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Johannes Assignee: Johannes Priority: Minor FLINK-1483 introduced separate directories for all threads. Unfortunately this seems to not work on windows, due to spaces in the filename -- This message was sent by Atlassian JIRA (v6.3.4#6332)