[jira] [Created] (FLINK-5545) remove FlinkAggregateExpandDistinctAggregatesRule when we upgrade calcite
Zhenghua Gao created FLINK-5545: --- Summary: remove FlinkAggregateExpandDistinctAggregatesRule when we upgrade calcite Key: FLINK-5545 URL: https://issues.apache.org/jira/browse/FLINK-5545 Project: Flink Issue Type: Bug Reporter: Zhenghua Gao Assignee: Zhenghua Gao Priority: Minor We copy calcite's AggregateExpandDistinctAggregatesRule to Flink project, and do a quick fix to avoid some bad case mentioned in CALCITE-1558. Should drop it and use calcite's AggregateExpandDistinctAggregatesRule when we upgrade to calcite 1.12(above) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [Dev] Dependencies issue related to implementing InputFormat Interface
Hi Pawan, If you want to read a file, you might want to extend the FileInputFormat class. It has already a lot of file-related functionality implemented. OT is the type of the records produced by the InputFormat. For example Tuple2 if the input format produce a tuple with two fields of String and Integer types. Best, Fabian 2017-01-18 4:52 GMT+01:00 Pawan Manishka Gunarathna < pawan.manis...@gmail.com>: > Hi, > Yeah I also wrote in the way you have written.. > > public class ReadFromFile implements InputFormat{ > } > > Is that a problem with that declaration or dependencies ? > > Thanks, > Pawan > > On Tue, Jan 17, 2017 at 7:56 PM, Chesnay Schepler > wrote: > > > Hello, > > > > Did you write something like this? > > > >public class MyInputFormat implements InputFormat >InputSplit> { > > > >} > > > > Regards, > > Chesnay > > > > On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote: > > > >> Hi, > >> > >> I'm currently working on Flink InputFormat Interface implementation. I'm > >> writing a java program to read data from a file using InputputFormat > >> Interface. I used maven project and I have added following dependencies > to > >> the pom.xml. > >> > >> > >> > >> org.apache.flink > >> flink-core > >> 1.1.4 > >> > >> > >> > >> org.apache.flink > >> flink-clients_2.11 > >> 1.1.4 > >> > >> > >> > >> org.apache.flink > >> flink-java > >> 1.1.4 > >> > >> > >> > >> > >> > >> I have a java class that implements InputFormat. It works with > >> *InputFormat. > >> *But it didn't allow to used *InputFormat. > *That > >> OT field didn't recognized. > >> > >> I need a any kind of help to solve this problem. > >> > >> Thanks, > >> Pawan > >> > >> > > > > > -- > > *Pawan Gunaratne* > *Mob: +94 770373556* >
Re: [jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB
Hi Florian, The memory usage depends on the types of keys and namespaces. We have not experienced that many concurrently open windows. But given that each open window needs several bytes for its timer, 2M open windows may cost up to hundreds of MB. Regards Xiaogang 2017-01-18 14:45 GMT+08:00 Florian König : > Hi, > > that sounds very useful. We are using quite a lot of timers in custom > windows. Does anybody have experience with the memory requirements of, > let’s say, 2 million concurrently open windows and the associated timers? > > Thanks > Florian > > > Am 18.01.2017 um 04:40 schrieb Xiaogang Shi (JIRA) : > > > > Xiaogang Shi created FLINK-5544: > > --- > > > > Summary: Implement Internal Timer Service in RocksDB > > Key: FLINK-5544 > > URL: https://issues.apache.org/jira/browse/FLINK-5544 > > Project: Flink > > Issue Type: Bug > > Components: Streaming > >Reporter: Xiaogang Shi > > > > > > Now the only implementation of internal timer service is > HeapInternalTimerService which stores all timers in memory. In the cases > where the number of keys is very large, the timer service will cost too > much memory. A implementation which stores timers in RocksDB seems good to > deal with these cases. > > > > It might be a little challenging to implement a RocksDB timer service > because the timers are accessed in different ways. When timers are > triggered, we need to access timers in the order of timestamp. But when > performing checkpoints, we must have a method to obtain all timers of a > given key group. > > > > A good implementation, as suggested by [~StephanEwen], follows the idea > of merge sorting. We can store timers in RocksDB with the format > {{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put > together and are sorted. > > > > Then we can deploy an in-memory heap which keeps the first timer of each > key group to get the next timer to trigger. When a key group's first timer > is updated, we can efficiently update the heap. > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.3.4#6332) > > >
Re: [jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB
Hi, that sounds very useful. We are using quite a lot of timers in custom windows. Does anybody have experience with the memory requirements of, let’s say, 2 million concurrently open windows and the associated timers? Thanks Florian > Am 18.01.2017 um 04:40 schrieb Xiaogang Shi (JIRA) : > > Xiaogang Shi created FLINK-5544: > --- > > Summary: Implement Internal Timer Service in RocksDB > Key: FLINK-5544 > URL: https://issues.apache.org/jira/browse/FLINK-5544 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Xiaogang Shi > > > Now the only implementation of internal timer service is > HeapInternalTimerService which stores all timers in memory. In the cases > where the number of keys is very large, the timer service will cost too much > memory. A implementation which stores timers in RocksDB seems good to deal > with these cases. > > It might be a little challenging to implement a RocksDB timer service because > the timers are accessed in different ways. When timers are triggered, we need > to access timers in the order of timestamp. But when performing checkpoints, > we must have a method to obtain all timers of a given key group. > > A good implementation, as suggested by [~StephanEwen], follows the idea of > merge sorting. We can store timers in RocksDB with the format > {{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put > together and are sorted. > > Then we can deploy an in-memory heap which keeps the first timer of each key > group to get the next timer to trigger. When a key group's first timer is > updated, we can efficiently update the heap. > > > > -- > This message was sent by Atlassian JIRA > (v6.3.4#6332)
Re: [Dev] Dependencies issue related to implementing InputFormat Interface
Hi, Yeah I also wrote in the way you have written.. public class ReadFromFile implements InputFormat{ } Is that a problem with that declaration or dependencies ? Thanks, Pawan On Tue, Jan 17, 2017 at 7:56 PM, Chesnay Schepler wrote: > Hello, > > Did you write something like this? > >public class MyInputFormat implements InputFormatInputSplit> { > >} > > Regards, > Chesnay > > On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote: > >> Hi, >> >> I'm currently working on Flink InputFormat Interface implementation. I'm >> writing a java program to read data from a file using InputputFormat >> Interface. I used maven project and I have added following dependencies to >> the pom.xml. >> >> >> >> org.apache.flink >> flink-core >> 1.1.4 >> >> >> >> org.apache.flink >> flink-clients_2.11 >> 1.1.4 >> >> >> >> org.apache.flink >> flink-java >> 1.1.4 >> >> >> >> >> >> I have a java class that implements InputFormat. It works with >> *InputFormat. >> *But it didn't allow to used *InputFormat. *That >> OT field didn't recognized. >> >> I need a any kind of help to solve this problem. >> >> Thanks, >> Pawan >> >> > -- *Pawan Gunaratne* *Mob: +94 770373556*
[jira] [Created] (FLINK-5544) Implement Internal Timer Service in RocksDB
Xiaogang Shi created FLINK-5544: --- Summary: Implement Internal Timer Service in RocksDB Key: FLINK-5544 URL: https://issues.apache.org/jira/browse/FLINK-5544 Project: Flink Issue Type: Bug Components: Streaming Reporter: Xiaogang Shi Now the only implementation of internal timer service is HeapInternalTimerService which stores all timers in memory. In the cases where the number of keys is very large, the timer service will cost too much memory. A implementation which stores timers in RocksDB seems good to deal with these cases. It might be a little challenging to implement a RocksDB timer service because the timers are accessed in different ways. When timers are triggered, we need to access timers in the order of timestamp. But when performing checkpoints, we must have a method to obtain all timers of a given key group. A good implementation, as suggested by [~StephanEwen], follows the idea of merge sorting. We can store timers in RocksDB with the format {{KEY_GROUP#TIMER#KEY}}. In this way, the timers under a key group are put together and are sorted. Then we can deploy an in-memory heap which keeps the first timer of each key group to get the next timer to trigger. When a key group's first timer is updated, we can efficiently update the heap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5543) customCommandLine tips in CliFrontend
shijinkui created FLINK-5543: Summary: customCommandLine tips in CliFrontend Key: FLINK-5543 URL: https://issues.apache.org/jira/browse/FLINK-5543 Project: Flink Issue Type: Improvement Components: Client Reporter: shijinkui Tips: DefaultCLI must be added at the end, because getActiveCustomCommandLine(..) will get the active CustomCommandLine in order and DefaultCLI isActive always return true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5542) YARN client incorrectly uses local YARN config to check vcore capacity
Shannon Carey created FLINK-5542: Summary: YARN client incorrectly uses local YARN config to check vcore capacity Key: FLINK-5542 URL: https://issues.apache.org/jira/browse/FLINK-5542 Project: Flink Issue Type: Bug Components: YARN Affects Versions: 1.1.4 Reporter: Shannon Carey See http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/1-1-4-on-YARN-vcores-change-td11016.html When using bin/yarn-session.sh, AbstractYarnClusterDescriptor line 271 in 1.1.4 is comparing the user's selected number of vcores to the vcores configured in the local node's YARN config (from YarnConfiguration eg. yarn-site.xml and yarn-default.xml). It incorrectly prevents Flink from launching even if there is sufficient vcore capacity on the cluster. That is not correct, because the application will not necessarily run on the local node. For example, if running the yarn-session.sh client from the AWS EMR master node, the vcore count there may be different from the vcore count on the core nodes where Flink will actually run. A reasonable way to fix this would probably be to reuse the logic from "yarn-session.sh -q" (FlinkYarnSessionCli line 550) which knows how to get vcore information from the real worker nodes. Alternatively, perhaps we could remove the check entirely and rely on YARN's Scheduler to determine whether sufficient resources exist. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5541) Missing null check for localJar in FlinkSubmitter#submitTopology()
Ted Yu created FLINK-5541: - Summary: Missing null check for localJar in FlinkSubmitter#submitTopology() Key: FLINK-5541 URL: https://issues.apache.org/jira/browse/FLINK-5541 Project: Flink Issue Type: Bug Reporter: Ted Yu Priority: Minor {code} if (localJar == null) { try { for (final URL url : ((ContextEnvironment) ExecutionEnvironment.getExecutionEnvironment()) .getJars()) { // TODO verify that there is only one jar localJar = new File(url.toURI()).getAbsolutePath(); } } catch (final URISyntaxException e) { // ignore } catch (final ClassCastException e) { // ignore } } logger.info("Submitting topology " + name + " in distributed mode with conf " + serConf); client.submitTopologyWithOpts(name, localJar, topology); {code} Since the try block may encounter URISyntaxException / ClassCastException, we should check that localJar is not null before calling submitTopologyWithOpts(). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: 答复: States split over to external storage
Hi liuxinchun, Thanks for expedite feedback! I think if dev community find it makes sense to invest on this feature, allowing user config eviction strategy(2) makes sense to me. Given the nature how flink job states increase various a lot, there might be a interface allow state backend decide which state can be evicted or restored. Regarding to (1), I see there are optimizations can give performance boost immediately. I would suggest raise a jira and discuss with whole dev community. There might be cases it will conflict with upcoming refactors. Notice Flink devs are super busy releasing 1.2 so expecting late response :) Thanks, Chen > > (1) The organization form of current sliding > window(SlidingProcessingTimeWindow > and SlidingEventTimeWindow) have a drawback: When using ListState, a > element may be kept in multiple windows (size / slide). It's time consuming > and waste storage when checkpointing. > Opinion: I think this is a optimal point. Elements can be organized > according to the key and split(maybe also can called as pane). When > triggering cleanup, only the oldest split(pane) can be cleanup. > (2) Incremental backup strategy. In original idea, we plan to only backup > the new coming element, and that means a whole window may span several > checkpoints, and we have develop this idea in our private SPS. But in > Flink, the window may not keep raw data(for example, ReducingState and > FoldingState). The idea of Chen Qin maybe a candidate strategy. We can keep > in touch and exchange our respective strategy. > -邮件原件- > 发件人: Chen Qin [mailto:c...@uber.com] > 发送时间: 2017年1月17日 13:30 > 收件人: dev@flink.apache.org > 抄送: iuxinc...@huawei.com; Aljoscha Krettek; shijinkui > 主题: States split over to external storage > > Hi there, > > I would like to discuss split over local states to external storage. The > use case is NOT another external state backend like HDFS, rather just to > expand beyond what local disk/ memory can hold when large key space exceeds > what task managers could handle. Realizing FLINK-4266 might be hard to > tacking all-in-one, I would live give a shot to split-over first. > > An intuitive approach would be treat HeapStatebackend as LRU cache and > split over to external key/value storage when threshold triggered. To make > this happen, we need minor refactor to runtime and adding set/get logic. > One nice thing of keeping HDFS to store snapshots would be avoid > versioning conflicts. Once checkpoint restore happens, partial write data > will be overwritten with previously checkpointed value. > > Comments? > > -- > -Chen Qin > -- -Chen Qin
[jira] [Created] (FLINK-5539) CLI: info/list/stop/cancel
Eron Wright created FLINK-5539: --- Summary: CLI: info/list/stop/cancel Key: FLINK-5539 URL: https://issues.apache.org/jira/browse/FLINK-5539 Project: Flink Issue Type: Sub-task Reporter: Eron Wright Implement the remaining CLI options (other than savepoint which is tracked by a different sub-task). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5540) CLI: savepoint
Eron Wright created FLINK-5540: --- Summary: CLI: savepoint Key: FLINK-5540 URL: https://issues.apache.org/jira/browse/FLINK-5540 Project: Flink Issue Type: Sub-task Reporter: Eron Wright Implement CLI support for savepoints, in both the 'run' and 'cancel' operations. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5538) Config option: Kerberos
Eron Wright created FLINK-5538: --- Summary: Config option: Kerberos Key: FLINK-5538 URL: https://issues.apache.org/jira/browse/FLINK-5538 Project: Flink Issue Type: Sub-task Reporter: Eron Wright -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5537) Config option: SSL
Eron Wright created FLINK-5537: --- Summary: Config option: SSL Key: FLINK-5537 URL: https://issues.apache.org/jira/browse/FLINK-5537 Project: Flink Issue Type: Sub-task Reporter: Eron Wright -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5535) Config option: HDFS
Eron Wright created FLINK-5535: --- Summary: Config option: HDFS Key: FLINK-5535 URL: https://issues.apache.org/jira/browse/FLINK-5535 Project: Flink Issue Type: Sub-task Reporter: Eron Wright -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5536) Config option: HA
Eron Wright created FLINK-5536: --- Summary: Config option: HA Key: FLINK-5536 URL: https://issues.apache.org/jira/browse/FLINK-5536 Project: Flink Issue Type: Sub-task Reporter: Eron Wright -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5534) Config option: 'flink-options'
Eron Wright created FLINK-5534: --- Summary: Config option: 'flink-options' Key: FLINK-5534 URL: https://issues.apache.org/jira/browse/FLINK-5534 Project: Flink Issue Type: Sub-task Reporter: Eron Wright Priority: Critical -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5533) DCOS Integration
Eron Wright created FLINK-5533: --- Summary: DCOS Integration Key: FLINK-5533 URL: https://issues.apache.org/jira/browse/FLINK-5533 Project: Flink Issue Type: New Feature Components: Mesos Reporter: Eron Wright Assignee: Till Rohrmann Umbrella issue for DCOS integration, including production-level features but not future improvements/bugs (for which a new 'DCOS' component might work best). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5532) Make the marker windowassignes for the fast aligned windows non-extendable.
Kostas Kloudas created FLINK-5532: - Summary: Make the marker windowassignes for the fast aligned windows non-extendable. Key: FLINK-5532 URL: https://issues.apache.org/jira/browse/FLINK-5532 Project: Flink Issue Type: Bug Components: Windowing Operators Affects Versions: 1.2.0 Reporter: Kostas Kloudas Fix For: 1.2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5531) SSl code block formatting is broken
Chesnay Schepler created FLINK-5531: --- Summary: SSl code block formatting is broken Key: FLINK-5531 URL: https://issues.apache.org/jira/browse/FLINK-5531 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 1.2.0, 1.3.0 Reporter: Chesnay Schepler Assignee: Chesnay Schepler Fix For: 1.2.0, 1.3.0 Most code blocks on the ssl page aren't rendered properly and are simply shown as text. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5530) race condition in AbstractRocksDBState#getSerializedValue
Nico Kruber created FLINK-5530: -- Summary: race condition in AbstractRocksDBState#getSerializedValue Key: FLINK-5530 URL: https://issues.apache.org/jira/browse/FLINK-5530 Project: Flink Issue Type: Bug Components: Queryable State Affects Versions: 1.2.0 Reporter: Nico Kruber Assignee: Nico Kruber Priority: Blocker AbstractRocksDBState#getSerializedValue() uses the same key serialisation stream as the ordinary state access methods but is called in parallel during state queries thus violating the assumption of only one thread accessing it. This may lead to either wrong results in queries or corrupt data while queries are executed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [Dev] Dependencies issue related to implementing InputFormat Interface
Hello, Did you write something like this? public class MyInputFormat implements InputFormat { } Regards, Chesnay On 17.01.2017 04:18, Pawan Manishka Gunarathna wrote: Hi, I'm currently working on Flink InputFormat Interface implementation. I'm writing a java program to read data from a file using InputputFormat Interface. I used maven project and I have added following dependencies to the pom.xml. org.apache.flink flink-core 1.1.4 org.apache.flink flink-clients_2.11 1.1.4 org.apache.flink flink-java 1.1.4 I have a java class that implements InputFormat. It works with *InputFormat. *But it didn't allow to used *InputFormat. *That OT field didn't recognized. I need a any kind of help to solve this problem. Thanks, Pawan
[jira] [Created] (FLINK-5529) Improve / extends windowing documentation
Stephan Ewen created FLINK-5529: --- Summary: Improve / extends windowing documentation Key: FLINK-5529 URL: https://issues.apache.org/jira/browse/FLINK-5529 Project: Flink Issue Type: Sub-task Components: Documentation Reporter: Stephan Ewen Assignee: Kostas Kloudas Fix For: 1.2.0, 1.3.0 Suggested Outline: {code} Windows (0) Outline: The anatomy of a window operation stream [.keyBy(...)] <- keyed versus non-keyed windows .window(...) <- required: "assigner" [.trigger(...)] <- optional: "trigger" (else default trigger) [.evictor(...)] <- optional: "evictor" (else no evictor) [.allowedLateness()] <- optional, else zero .reduce/fold/apply() <- required: "function" (1) Types of windows - tumble - slide - session - global (2) Pre-defined windows timeWindow() (tumble, slide) countWindow() (tumble, slide) - mention that count windows are inherently resource leaky unless limited key space (3) Window Functions - apply: most basic, iterates over elements in window - aggregating: reduce and fold, can be used with "apply()" which will get one element - forward reference to state size section (4) Advanced Windows - assigner - simple - merging - trigger - registering timers (processing time, event time) - state in triggers - life cycle of a window - create - state - cleanup - when is window contents purged - when is state dropped - when is metadata (like merging set) dropped (5) Late data - picture - fire vs fire_and_purge: late accumulates vs late resurrects (cf discarding mode) (6) Evictors - TDB (7) State size: How large will the state be? Basic rule: Each element has one copy per window it is assigned to --> num windows * num elements in window --> example: tumbline is one copy, sliding(n,m) is n/m copies --> per key Pre-aggregation: - if reduce or fold is set -> one element per window (rather than num elements in window) - evictor voids pre-aggregation from the perspective of state Special rules: - fold cannot pre-aggregate on session windows (and other merging windows) (8) Non-keyed windows - all elements through the same windows - currently not parallel - possible parallel in the future when having pre-aggregation functions - inherently (by definition) produce a result stream with parallelism one - state similar to one key of keyed windows {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5528) tests: reduce the retry delay in QueryableStateITCase
Nico Kruber created FLINK-5528: -- Summary: tests: reduce the retry delay in QueryableStateITCase Key: FLINK-5528 URL: https://issues.apache.org/jira/browse/FLINK-5528 Project: Flink Issue Type: Improvement Components: Queryable State Affects Versions: 1.2.0 Reporter: Nico Kruber Assignee: Nico Kruber The QueryableStateITCase uses a retry of 1 second, e.g. if a queried key does not exist yet. This seems a bit too conservative as the job may not take that long to deploy and especially since getKvStateWithRetries() recovers from failures by retrying. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5527) QueryableState: requesting a non-existing key in MemoryStateBackend or FsStateBackend does not return the default value
Nico Kruber created FLINK-5527: -- Summary: QueryableState: requesting a non-existing key in MemoryStateBackend or FsStateBackend does not return the default value Key: FLINK-5527 URL: https://issues.apache.org/jira/browse/FLINK-5527 Project: Flink Issue Type: Improvement Components: Queryable State Affects Versions: 1.2.0 Reporter: Nico Kruber Assignee: Nico Kruber Querying for a non-existing key for a state that has a default value set currently results in an UnknownKeyOrNamespace exception when the MemoryStateBackend or FsStateBackend is used. It should return the default value instead just like the RocksDBStateBackend. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5526) QueryableState: notify upon receiving a query but having queryable state disabled
Nico Kruber created FLINK-5526: -- Summary: QueryableState: notify upon receiving a query but having queryable state disabled Key: FLINK-5526 URL: https://issues.apache.org/jira/browse/FLINK-5526 Project: Flink Issue Type: Improvement Components: Queryable State Affects Versions: 1.2.0 Reporter: Nico Kruber Priority: Minor When querying state but having it disabled in the config, a warning should be presented to the user that a query was received but the component is disabled. This is in addition to the query itself failing with a rather generic exception that is not pointing to this fact. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5525) Streaming Version of a Linear Regression model
Stavros Kontopoulos created FLINK-5525: -- Summary: Streaming Version of a Linear Regression model Key: FLINK-5525 URL: https://issues.apache.org/jira/browse/FLINK-5525 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Stavros Kontopoulos Given the nature of Flink we should have a streaming version of the algorithms when possible. Update of the model should be done on a per window basis. An extreme case is: https://en.wikipedia.org/wiki/Online_machine_learning Resources [1] http://scikit-learn.org/dev/modules/scaling_strategies.html#incremental-learning [2] http://stats.stackexchange.com/questions/6920/efficient-online-linear-regression -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: New Flink team member - Kate Eri.
ok, I've got it. I will take a look at https://github.com/apache/flink/pull/2735. вт, 17 янв. 2017 г. в 14:36, Theodore Vasiloudis < theodoros.vasilou...@gmail.com>: > Hello Katherin, > > Welcome to the Flink community! > > The ML component definitely needs a lot of work you are correct, we are > facing similar problems to CEP, which we'll hopefully resolve with the > restructuring Stephan has mentioned in that thread. > > If you'd like to help out with PRs we have many open, one I have started > reviewing but got side-tracked is the Word2Vec one [1]. > > Best, > Theodore > > [1] https://github.com/apache/flink/pull/2735 > > On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske wrote: > > > Hi Katherin, > > > > welcome to the Flink community! > > Help with reviewing PRs is always very welcome and a great way to > > contribute. > > > > Best, Fabian > > > > > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko : > > > > > Thank you, Timo. > > > I have started the analysis of the topic. > > > And if it necessary, I will try to perform the review of other pulls) > > > > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther : > > > > > > > Hi Katherin, > > > > > > > > great to hear that you would like to contribute! Welcome! > > > > > > > > I gave you contributor permissions. You can now assign issues to > > > > yourself. I assigned FLINK-1750 to you. > > > > Right now there are many open ML pull requests, you are very welcome > to > > > > review the code of others, too. > > > > > > > > Timo > > > > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > > > > > Hello, All! > > > > > > > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience, > > > also > > > > I > > > > > have some expertise with scala (half of the year). > > > > > > > > > > Last 2 years I have participated in several BigData projects that > > were > > > > > related to Machine Learning (Time series analysis, Recommender > > systems, > > > > > Social networking) and ETL. I have experience with Hadoop, Apache > > Spark > > > > and > > > > > Hive. > > > > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink project requires some > work > > > in > > > > > this area, so that’s why I would like to join Flink and ask me to > > grant > > > > the > > > > > assignment of the ticket > > > > https://issues.apache.org/jira/browse/FLINK-1750 > > > > > to me. > > > > > > > > > > > > > > > > > > >
Re: New Flink team member - Kate Eri.
Hello Katherin, Welcome to the Flink community! The ML component definitely needs a lot of work you are correct, we are facing similar problems to CEP, which we'll hopefully resolve with the restructuring Stephan has mentioned in that thread. If you'd like to help out with PRs we have many open, one I have started reviewing but got side-tracked is the Word2Vec one [1]. Best, Theodore [1] https://github.com/apache/flink/pull/2735 On Tue, Jan 17, 2017 at 12:17 PM, Fabian Hueske wrote: > Hi Katherin, > > welcome to the Flink community! > Help with reviewing PRs is always very welcome and a great way to > contribute. > > Best, Fabian > > > > 2017-01-17 11:17 GMT+01:00 Katherin Sotenko : > > > Thank you, Timo. > > I have started the analysis of the topic. > > And if it necessary, I will try to perform the review of other pulls) > > > > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther : > > > > > Hi Katherin, > > > > > > great to hear that you would like to contribute! Welcome! > > > > > > I gave you contributor permissions. You can now assign issues to > > > yourself. I assigned FLINK-1750 to you. > > > Right now there are many open ML pull requests, you are very welcome to > > > review the code of others, too. > > > > > > Timo > > > > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > > > > Hello, All! > > > > > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience, > > also > > > I > > > > have some expertise with scala (half of the year). > > > > > > > > Last 2 years I have participated in several BigData projects that > were > > > > related to Machine Learning (Time series analysis, Recommender > systems, > > > > Social networking) and ETL. I have experience with Hadoop, Apache > Spark > > > and > > > > Hive. > > > > > > > > > > > > I’m fond of ML topic, and I see that Flink project requires some work > > in > > > > this area, so that’s why I would like to join Flink and ask me to > grant > > > the > > > > assignment of the ticket > > > https://issues.apache.org/jira/browse/FLINK-1750 > > > > to me. > > > > > > > > > > > > >
[jira] [Created] (FLINK-5524) Support early out for code generated conjunctive conditions
Fabian Hueske created FLINK-5524: Summary: Support early out for code generated conjunctive conditions Key: FLINK-5524 URL: https://issues.apache.org/jira/browse/FLINK-5524 Project: Flink Issue Type: Improvement Components: Table API & SQL Affects Versions: 1.1.4, 1.2.0, 1.3.0 Reporter: Fabian Hueske Currently, all nested conditions for a conjunctive predicate are evaluated before the conjunction is checked. A condition like {{(v1 == v2) && (v3 < 5)}} would be compiled into {code} boolean res1; if (v1 == v2) { res1 = true; } else { res1 = false; } boolean res2; if (v3 < 5) { res2 = true; } else { res2 = false; } boolean res3; if (res1 && res2) { res3 = true; } else { res3 = false; } if (res3) { // emit something } {code} It would be better to leave the generated code as early as possible, e.g., with a {{return}} instead of {{res1 = false}}. The code generator needs a bit of context information for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-5523) Improve access of fields in code generated functions with filters
Fabian Hueske created FLINK-5523: Summary: Improve access of fields in code generated functions with filters Key: FLINK-5523 URL: https://issues.apache.org/jira/browse/FLINK-5523 Project: Flink Issue Type: Improvement Components: Table API & SQL Affects Versions: 1.1.4, 1.2.0, 1.3.0 Reporter: Fabian Hueske Priority: Minor The generated code for Table API / queries, accesses all required fields (for conditions and projections) and performs null checks before the first condition is evaluated. It would be better to move the access of fields which are only required for projection behind the condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: New Flink team member - Kate Eri.
Hi Katherin, welcome to the Flink community! Help with reviewing PRs is always very welcome and a great way to contribute. Best, Fabian 2017-01-17 11:17 GMT+01:00 Katherin Sotenko : > Thank you, Timo. > I have started the analysis of the topic. > And if it necessary, I will try to perform the review of other pulls) > > > вт, 17 янв. 2017 г. в 13:09, Timo Walther : > > > Hi Katherin, > > > > great to hear that you would like to contribute! Welcome! > > > > I gave you contributor permissions. You can now assign issues to > > yourself. I assigned FLINK-1750 to you. > > Right now there are many open ML pull requests, you are very welcome to > > review the code of others, too. > > > > Timo > > > > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > > > Hello, All! > > > > > > > > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience, > also > > I > > > have some expertise with scala (half of the year). > > > > > > Last 2 years I have participated in several BigData projects that were > > > related to Machine Learning (Time series analysis, Recommender systems, > > > Social networking) and ETL. I have experience with Hadoop, Apache Spark > > and > > > Hive. > > > > > > > > > I’m fond of ML topic, and I see that Flink project requires some work > in > > > this area, so that’s why I would like to join Flink and ask me to grant > > the > > > assignment of the ticket > > https://issues.apache.org/jira/browse/FLINK-1750 > > > to me. > > > > > > > >
[jira] [Created] (FLINK-5522) Storm LocalCluster can't run with powermock
liuyuzhong7 created FLINK-5522: -- Summary: Storm LocalCluster can't run with powermock Key: FLINK-5522 URL: https://issues.apache.org/jira/browse/FLINK-5522 Project: Flink Issue Type: Improvement Components: DataStream API Affects Versions: 1.3.0 Reporter: liuyuzhong7 Fix For: 1.3.0 Strom LocalCluster can't run with powermock. For example: The codes which commented in WrapperSetupHelperTest.testCreateTopologyContext -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: New Flink team member - Kate Eri.
Thank you, Timo. I have started the analysis of the topic. And if it necessary, I will try to perform the review of other pulls) вт, 17 янв. 2017 г. в 13:09, Timo Walther : > Hi Katherin, > > great to hear that you would like to contribute! Welcome! > > I gave you contributor permissions. You can now assign issues to > yourself. I assigned FLINK-1750 to you. > Right now there are many open ML pull requests, you are very welcome to > review the code of others, too. > > Timo > > > Am 17/01/17 um 10:39 schrieb Katherin Sotenko: > > Hello, All! > > > > > > > > I'm Kate Eri, I'm java developer with 6-year enterprise experience, also > I > > have some expertise with scala (half of the year). > > > > Last 2 years I have participated in several BigData projects that were > > related to Machine Learning (Time series analysis, Recommender systems, > > Social networking) and ETL. I have experience with Hadoop, Apache Spark > and > > Hive. > > > > > > I’m fond of ML topic, and I see that Flink project requires some work in > > this area, so that’s why I would like to join Flink and ask me to grant > the > > assignment of the ticket > https://issues.apache.org/jira/browse/FLINK-1750 > > to me. > > > >
Re: New Flink team member - Kate Eri.
Hi Katherin, great to hear that you would like to contribute! Welcome! I gave you contributor permissions. You can now assign issues to yourself. I assigned FLINK-1750 to you. Right now there are many open ML pull requests, you are very welcome to review the code of others, too. Timo Am 17/01/17 um 10:39 schrieb Katherin Sotenko: Hello, All! I'm Kate Eri, I'm java developer with 6-year enterprise experience, also I have some expertise with scala (half of the year). Last 2 years I have participated in several BigData projects that were related to Machine Learning (Time series analysis, Recommender systems, Social networking) and ETL. I have experience with Hadoop, Apache Spark and Hive. I’m fond of ML topic, and I see that Flink project requires some work in this area, so that’s why I would like to join Flink and ask me to grant the assignment of the ticket https://issues.apache.org/jira/browse/FLINK-1750 to me.
[jira] [Created] (FLINK-5521) remove unused KvStateRequestSerializer#serializeList
Nico Kruber created FLINK-5521: -- Summary: remove unused KvStateRequestSerializer#serializeList Key: FLINK-5521 URL: https://issues.apache.org/jira/browse/FLINK-5521 Project: Flink Issue Type: Improvement Components: Queryable State Affects Versions: 1.2.0 Reporter: Nico Kruber Assignee: Nico Kruber KvStateRequestSerializer#serializeList is unused and instead the state backends' serialisation functions are used. Therefore, remove this one and make sure KvStateRequestSerializer#deserializeList works with the state backends' ones. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
New Flink team member - Kate Eri.
Hello, All! I'm Kate Eri, I'm java developer with 6-year enterprise experience, also I have some expertise with scala (half of the year). Last 2 years I have participated in several BigData projects that were related to Machine Learning (Time series analysis, Recommender systems, Social networking) and ETL. I have experience with Hadoop, Apache Spark and Hive. I’m fond of ML topic, and I see that Flink project requires some work in this area, so that’s why I would like to join Flink and ask me to grant the assignment of the ticket https://issues.apache.org/jira/browse/FLINK-1750 to me.