[jira] [Created] (FLINK-4586) NumberSequenceIterator and Accumulator threading issue

2016-09-06 Thread Johannes (JIRA)
Johannes created FLINK-4586:
---

 Summary: NumberSequenceIterator and Accumulator threading issue
 Key: FLINK-4586
 URL: https://issues.apache.org/jira/browse/FLINK-4586
 Project: Flink
  Issue Type: Bug
  Components: DataSet API
Affects Versions: 1.1.2
Reporter: Johannes
Priority: Minor


There is a strange problem when using the NumberSequenceIterator in combination 
with an AverageAccumulator.

It seems like the individual accumulators are reinitialized and overwrite parts 
of intermediate solution.

The following scala snippit exemplifies the problem.
Instead of printing the correct average, the result should be {{50.5}} but is 
something completely different, like {{8.08}}, dependent on the number of cores 
used.
If the parallelism is set to {{1}} the result is correct, which seems like 
there is a problem with threading. The problem occurs using the java and scala 
API.

{code}
env
  .fromParallelCollection(new NumberSequenceIterator(1, 100))
  .map(new RichMapFunction[Long, Long] {
var a : AverageAccumulator = _

override def map(value: Long): Long = {
  a.add(value)
  value
}

override def open(parameters: Configuration): Unit = {
  a = new AverageAccumulator
  getRuntimeContext.addAccumulator("test", a)
}
  })
  .reduce((a, b) => a + b)
  .print()


val lastJobExecutionResult: JobExecutionResult = env.getLastJobExecutionResult

println(lastJobExecutionResult.getAccumulatorResult("test"))
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3293) Custom Application Name on YARN is ignored in deploy jobmanager mode

2016-01-26 Thread Johannes (JIRA)
Johannes created FLINK-3293:
---

 Summary: Custom Application Name on YARN is ignored in deploy 
jobmanager mode
 Key: FLINK-3293
 URL: https://issues.apache.org/jira/browse/FLINK-3293
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.10.1
Reporter: Johannes
Priority: Minor


FLINK-2298 introduced a custom name for the job.

This is ignored when the yarn application is started as part of the job 
submission, e.g.

   flink run -m yarn-cluster -ynm myname

It is always set using the classname as program name

flinkYarnClient.setName("Flink Application: " + programName);

The client get's constructed using

   AbstractFlinkYarnClient flinkYarnClient = 
CliFrontendParser.getFlinkYarnSessionCli().createFlinkYarnClient(commandLine);

So the name will be parsed correctly, it is just overwritten.
This should be a fallback, when no name is provided



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3090) Create a Parcel Distribution for Cloudera Manager

2015-11-28 Thread Johannes (JIRA)
Johannes created FLINK-3090:
---

 Summary: Create a Parcel Distribution for Cloudera Manager
 Key: FLINK-3090
 URL: https://issues.apache.org/jira/browse/FLINK-3090
 Project: Flink
  Issue Type: Improvement
  Components: release
Affects Versions: 0.10.1
Reporter: Johannes
Priority: Minor


For ease of deployment it would be nice to provide a parcel distribution of 
Flink which can be easily managed using Clouder Manager.

This would set up all necessary dependencies on all machine and starts Flink 
jobs using yarn.

A good description of how to get started can be found in the  [Cloudera Manager 
Extensibility Tools and Documentation | https://github.com/cloudera/cm_ext]

What needs to be done
* Create a service description
* Create a parcel containing the release files
* Create scripts that can write the appropriate configuration files, taking 
values from the cloudera manager web config

For reference [a collection on how this is configured for other 
services|https://github.com/cloudera/cm_csds], such as Spark.

Some [starter code | https://github.com/jkirsch/cmflink] can be found here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2984) Support lenient parsing of SVMLight input files

2015-11-06 Thread Johannes (JIRA)
Johannes created FLINK-2984:
---

 Summary: Support lenient parsing of SVMLight input files
 Key: FLINK-2984
 URL: https://issues.apache.org/jira/browse/FLINK-2984
 Project: Flink
  Issue Type: Improvement
  Components: Machine Learning Library
Affects Versions: 0.9.1
Reporter: Johannes
Priority: Trivial


The current implementation for the reader assumes that the format follows the 
exact specification.

The [splice-site Dataset| 
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#splice-site]
 dataset is formatted slightly different

Example
{noformat}
-1  1:0.381846 2:0.163648 3:0.245472 4:0.627318
{noformat}

note the two spaces after the label.

Currently MLUtils.scala splits on single spaces.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1596) FileIOChannel introduces space in temp file name

2015-02-21 Thread Johannes (JIRA)
Johannes created FLINK-1596:
---

 Summary: FileIOChannel introduces space in temp file name
 Key: FLINK-1596
 URL: https://issues.apache.org/jira/browse/FLINK-1596
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Johannes
Assignee: Johannes
Priority: Minor


FLINK-1483 introduced separate directories for all threads.
Unfortunately this seems to not work on windows, due to spaces in the filename



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)