Re: ConvertCSVtoAvro | support for "||" delimiter
I didn't know there was a unit separator character, thanks for the suggestion. I think I have a lot of ☃ to replace. If you can paste the unit separator character in, then it should work. The underlying code supports escape sequences, like \t, but the validation doesn't take those into account yet. That would be a good starter contribution for someone out there... rb On 02/04/2016 12:39 PM, Alan Jackoway wrote: Though I love the concept of ☃ as your separator, my belief is that the correct way to do this to replace your custom delimiter with the ones that are defined in ASCII (and therefore extremely unlikely to appear in your data): https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text That said, I have not actually tried this with NiFi, so I don't know how easy it is to specify ASCII character 31 as your separator in the UI. On Thu, Feb 4, 2016 at 2:34 PM, Ryan Blue wrote: The underlying CSV library only supports a single-character delimiter, so it would be a bit of work to allow multi-char delimiters. Another solution is to use | as your delimiter and simply account for that in your file header. Everything is mapped by name, so you'd just have a bunch of columns named "" and it should work fine otherwise. That may not work if your delimiter is || because you might have | in your data, though. If that's the case, then I'd go with the suggestion from Joe to replace "||" with a single-character delimiter that you won't see in the data, like ☃. rb On 02/04/2016 06:50 AM, Joe Witt wrote: Not a direct answer but: With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will have a great option in scripting (Lua, Python, Ruby, Groovy, Javascript) that will let you rapidly get past these hurdles without having to build your own custom processor until you are sure what you need. Thanks Joe On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc wrote: With that processor alone it doesn't appear so. The validator for that property requires it to be one character. On Feb 3, 2016 1:01 AM, "shweta" wrote: Hi All, It seems "ConvertCSVtoAvro" only support single character as delimiter in Nifi. Is there a way to specify "||" delimiter. Thanks, Shweta -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com. -- Ryan Blue Software Engineer Cloudera, Inc. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: ConvertCSVtoAvro | support for "||" delimiter
The underlying CSV library only supports a single-character delimiter, so it would be a bit of work to allow multi-char delimiters. Another solution is to use | as your delimiter and simply account for that in your file header. Everything is mapped by name, so you'd just have a bunch of columns named "" and it should work fine otherwise. That may not work if your delimiter is || because you might have | in your data, though. If that's the case, then I'd go with the suggestion from Joe to replace "||" with a single-character delimiter that you won't see in the data, like ☃. rb On 02/04/2016 06:50 AM, Joe Witt wrote: Not a direct answer but: With NIFI-210 arriving in the upcoming NiFi 0.5.0 release you will have a great option in scripting (Lua, Python, Ruby, Groovy, Javascript) that will let you rapidly get past these hurdles without having to build your own custom processor until you are sure what you need. Thanks Joe On Thu, Feb 4, 2016 at 6:48 AM, Tony Kurc wrote: With that processor alone it doesn't appear so. The validator for that property requires it to be one character. On Feb 3, 2016 1:01 AM, "shweta" wrote: Hi All, It seems "ConvertCSVtoAvro" only support single character as delimiter in Nifi. Is there a way to specify "||" delimiter. Thanks, Shweta -- View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/ConvertCSVtoAvro-support-for-delimiter-tp7116.html Sent from the Apache NiFi Developer List mailing list archive at Nabble.com. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: discuss nifi 0.4.1
Yes, which affects when you time getting something into master. Larger features that are done just before a release (more risk) can get pushed so that they are committed after a release instead of just before one. Regular releases ensure the penalty for choosing to get into the next release aren't too high. You could make the argument that master should always be in a releasable state, but I think that even when reviews are done right there is risk for some features. All I want to note is that a regular release cadence helps mitigate that risk when we stick to it. rb On 12/17/2015 04:32 PM, Tony Kurc wrote: I'm not sure I understand "more validation" reasoning - won't features at the end have very little validation? On Thu, Dec 17, 2015 at 7:26 PM, Ryan Blue wrote: Another reason to release 0.4.1 is to allow the additions that warrant 0.5.0 to have more validation before release. With a regular release cycle, features can go in at the beginning to have more time for catching bugs in them. I also agree with what Sean said below. rb On 12/17/2015 04:00 PM, Sean Busbey wrote: On Thu, Dec 17, 2015 at 5:50 PM, Tony Kurc wrote: s/features/buxfixes/ On Thu, Dec 17, 2015 at 6:50 PM, Tony Kurc wrote: Is there a reason to not just cut a 0.5.0 instead of grafting 0.5.0 features onto 0.4.1? This is a good question. Some downstream users might have different change processes for a patch vs minor release, so making a 0.4.1 that fixes what we determine to be a substantial gap in the 0.4 line would be nice for them. While we might be a young project now, it would be good to already have the habit practiced for when we have more users in enterprise settings. On the other hand, 0.4.0 just happened, so a release in 3 days would minimize the number of "stuck on 0.4.0" folks. -- Ryan Blue Software Engineer Cloudera, Inc. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: discuss nifi 0.4.1
Another reason to release 0.4.1 is to allow the additions that warrant 0.5.0 to have more validation before release. With a regular release cycle, features can go in at the beginning to have more time for catching bugs in them. I also agree with what Sean said below. rb On 12/17/2015 04:00 PM, Sean Busbey wrote: On Thu, Dec 17, 2015 at 5:50 PM, Tony Kurc wrote: s/features/buxfixes/ On Thu, Dec 17, 2015 at 6:50 PM, Tony Kurc wrote: Is there a reason to not just cut a 0.5.0 instead of grafting 0.5.0 features onto 0.4.1? This is a good question. Some downstream users might have different change processes for a patch vs minor release, so making a 0.4.1 that fixes what we determine to be a substantial gap in the 0.4 line would be nice for them. While we might be a young project now, it would be good to already have the habit practiced for when we have more users in enterprise settings. On the other hand, 0.4.0 just happened, so a release in 3 days would minimize the number of "stuck on 0.4.0" folks. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: discuss nifi 0.4.1
Branching from master at the start of 0.4.1-SNAPSHOT and cherry-picking makes sense to me. rb On 12/17/2015 12:29 PM, Joe Witt wrote: team, matt clarke just discovered an interesting case that appears to expose a defect in site-to-site. The details of it are still being worked out as you can see in NIFI-1301. And this issue has been around for a very long time but it still feels like something worth addressing in an incremental/bug release (0.4.1). I looked at already addressed bugs on 050 and added the to fix versions of 041 as well. What I am wondering here is a bit of a proper usage and thinking with Git. Would it make sense to branch off master right where 0.4.1-SNAPSHOT started, then cherry pick the commits into this new branch, and just release that branch never needing then to merge that back to master since these fixes are all already on master anyway? https://issues.apache.org/jira/browse/NIFI-1301?jql=project%20%3D%20NIFI%20AND%20fixVersion%20%3D%200.4.1%20ORDER%20BY%20due%20ASC%2C%20priority%20DESC%2C%20created%20ASC Thanks Joe -- Ryan Blue Software Engineer Cloudera, Inc.
Re: JSON / Avro issues
Jeff, I've answered inline. Thanks for using the processor, sorry it isn't clear what's happening. rb On 11/05/2015 01:59 PM, Jeff wrote: I built a simple flow that reads a tab separated file and attempts to convert to Avro. ConvertCSVtoAvro just says that the conversion failed. Where can I find more information on what the failure was? Information about failures is added to the "errors" attribute on files emitted to the failure relationship. Unfortunately, right now the files aren't filtered to just the failed rows. That's something we need to fix, but it does accumulate error messages so you get something like: "NumberFormatException: 'turkey' is not an integer (1,234 similar errors)" Using the same sample tab separated file, I create a JSON file out of it. The JSON to Avro processor also fails with very little explication. These processors are basically the same on the inside. :) Same place for errors. I think the problem is likely that some of the values are failing to convert to the Avro type you've selected. With regard to the ConvertCSVtoAvro processor Since my file is tab delimited, do I simple open the "CSV delimiter” property, delete , and hit the tab key or is there a special syntax like ^t? My data has no CSV quote character so do I leave this as “or delete it or check the empty box? This could definitely be a problem. The delimiter is what you want. It works with both a tab character (I usually paste it in since the browser uses it as a movement key) and with \t, though I think there's a bug where you can't have 2-character delimiters in the validation. I should fix that. With regard to the ConvertJSONtoAvro What is the expected JSON source file to look like? [ {fields values … }, {fields values …} ] Or {fields values … } {fields values …} or something else. This should be the second case. the JSON to Avro processor can't handle JSON lists as the root just yet. You should simply concatenate JSON. The whitespace doesn't matter. rb -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [VOTE] Release Apache NiFi 0.4.0 (rc2)
+1 (non-binding) Checked source artifact, spot-checked some license docs on new modules rb On 12/08/2015 02:58 PM, Ricky Saltzer wrote: +1 build works test works hashes/keys check out tested a couple simple workflows On Tue, Dec 8, 2015 at 4:57 PM, Joe Percivall < joeperciv...@yahoo.com.invalid> wrote: Ran through the helper: validated keys, built binaries on Windows and Mac, ran some templates. Everything worked as expected. +1 Release this package as Apache NiFi 0.4.0. - - - - - - Joseph Percivalllinkedin.com/in/Percivalle: joeperciv...@yahoo.com On Tuesday, December 8, 2015 3:29 PM, Joe Witt wrote: Hello NiFi Community, I am pleased to be calling this vote for the source release of Apache NiFi 0.4.0. The source zip, including signatures, digests, and associated convenience binaries can be found at https://dist.apache.org/repos/dist/dev/nifi/nifi-0.4.0/ The staged maven artifacts of the build can be found at https://repository.apache.org/content/repositories/orgapachenifi-1065 The Git tag is nifi-0.4.0-RC2 The Git commit ID is b66c029090f395c0cbc001fd483e86895b133e46 https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=b66c029090f395c0cbc001fd483e86895b133e46 Checksums of NiFi 0.4.0 Source Release MD5: da733f8fdb520a0346dcda59940b2c12 SHA1: 82fffbc5f8d7e4724bbe2f794bdde39396dae745 Release artifacts are signed with the following key https://people.apache.org/keys/committer/joewitt.asc KEYS file available here https://dist.apache.org/repos/dist/release/nifi/KEYS 161 issues were closed/resolved for this release https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12333070 Release note highlights https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0 Migration/Upgrade guidance https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi The vote will be open for 72 hours. Please download the release candidate and evaluate the necessary items including checking hashes, signatures, build from source, and test. Then please vote: [ ] +1 Release this package as Apache NiFi 0.4.0 [ ] +0 no opinion [ ] -1 Do not release this package because... -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [VOTE] Release Apache NiFi 0.4.0 (rc1)
Great, I must have missed that. Thanks, Joe! rb On 12/07/2015 12:47 PM, Joe Witt wrote: The RC1 cancellation notice went out earlier. I'll send out another vote thread in a couple hours as the bug fixes are in. Thanks Joe On Mon, Dec 7, 2015 at 3:41 PM, Ryan Blue wrote: Is the consensus to go ahead with the release vote and this known bug, or is this a blocker? In other words, should we continue to check this release or consider the vote canceled? rb On 12/06/2015 09:11 PM, Tony Kurc wrote: Joe - I'm putting a ticket in for a fix. Looks like it was introduced by the NIFI-1246 patch. On Sun, Dec 6, 2015 at 11:36 PM, Joe Percivall < joeperciv...@yahoo.com.invalid> wrote: Yup I saw the same behavior. On the second try (doing mvn clean install -rf :nifi-standard-processors) the tailfile error went away. The listFile error still occurred though. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Sunday, December 6, 2015 11:32 PM, Tony Kurc wrote: Er, just the tailfile error On Dec 6, 2015 11:31 PM, "Tony Kurc" wrote: Joe, I had this happen and it worked on a second try. On Dec 6, 2015 11:23 PM, "Joe Percivall" Windows 8 build fails with maven 3.3.3 and Java 1.8.0_65. I get these error messages: TestListFile.testRecurse:441 expected: but was: TestTailFile.testMultipleRolloversAfterHavingReadAllDataWhileStillRunning:381 expected:<[world]> but was:<[abc These were not any of the same errors I saw last time testing on Windows a couple weeks ago. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Sunday, December 6, 2015 9:05 PM, Tony Kurc wrote: I've gotten confirmation of CentOS 7.1.1503 x86_64, Oracle JDK 8u66 working fine. and Fedora 23 not working with the same error that Andre reported. On Sun, Dec 6, 2015 at 5:50 PM, Tony Kurc wrote: I'll also try it on windows 10 (again x64_64) On Sun, Dec 6, 2015 at 5:36 PM, wrote: I can run it on Windows 8 tonight if no one else has. Joe Sent from my phone On Dec 6, 2015, at 4:09 PM, Tony Kurc wrote: Signatures and hashes look good. Built fine on Ubuntu 14.04 x86_64. I even cursed a little bit less at TestJdbcHugeStream! LICENSE, NOTICE and README look good. Docs look good. Binary ran successfully. +1 Did anyone try building on windows? On Sat, Dec 5, 2015 at 11:46 PM, Aldrin Piri < aldrinp...@gmail.com> wrote: Followed helper provided by Joe. Keys good. Signatures good. Hashes good. Source release builds and passes contrib Required docs present and look correct. Checked out copy of repo for specified commit hash and diff'd against source bundle. Commit is as anticipated. Ran convenience binary with varying templates all successfully. Release notes and upgrade/migration guides look good. Kudos to the community on all the efforts involved with this release. +1, Release this package as Apache NiFi 0.4.0 On Sat, Dec 5, 2015 at 10:32 PM, Joe Witt wrote: Hello NiFi Community, I am pleased to be calling this vote for the source release of Apache NiFi 0.4.0. The source zip, including signatures, digests, and associated convenience binaries can be found at: https://dist.apache.org/repos/dist/dev/nifi/nifi-0.4.0/ The Git tag is nifi-0.4.0-RC1 The Git commit ID is 191a56f54e3ec178f9f29e1287f23ba66dbf9e43 https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=191a56f54e3ec178f9f29e1287f23ba66dbf9e43 Checksums of NiFi 0.4.0 Source Release: MD5: b69fd7ec632d7569906e20508058556b SHA1: 31d88ec7a8431ba5935370eb09be7a343c46411c Release artifacts are signed with the following key: https://people.apache.org/keys/committer/joewitt.asc KEYS file available here: https://dist.apache.org/repos/dist/release/nifi/KEYS 152 issues were closed/resolved for this release: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12333070 Release note highlights: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0 Migration/Upgrade guidance: https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi The vote will be open for 72 hours. Please download the release candidate and evaluate the necessary items including checking hashes, signatures, build from source, and test. Then please vote: [ ] +1 Release this package as Apache NiFi 0.4.0 [ ] +0 no opinion [ ] -1 Do not release this package because... -- Ryan Blue Software Engineer Cloudera, Inc. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [VOTE] Release Apache NiFi 0.4.0 (rc1)
Is the consensus to go ahead with the release vote and this known bug, or is this a blocker? In other words, should we continue to check this release or consider the vote canceled? rb On 12/06/2015 09:11 PM, Tony Kurc wrote: Joe - I'm putting a ticket in for a fix. Looks like it was introduced by the NIFI-1246 patch. On Sun, Dec 6, 2015 at 11:36 PM, Joe Percivall < joeperciv...@yahoo.com.invalid> wrote: Yup I saw the same behavior. On the second try (doing mvn clean install -rf :nifi-standard-processors) the tailfile error went away. The listFile error still occurred though. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Sunday, December 6, 2015 11:32 PM, Tony Kurc wrote: Er, just the tailfile error On Dec 6, 2015 11:31 PM, "Tony Kurc" wrote: Joe, I had this happen and it worked on a second try. On Dec 6, 2015 11:23 PM, "Joe Percivall" Windows 8 build fails with maven 3.3.3 and Java 1.8.0_65. I get these error messages: TestListFile.testRecurse:441 expected: but was: TestTailFile.testMultipleRolloversAfterHavingReadAllDataWhileStillRunning:381 expected:<[world]> but was:<[abc These were not any of the same errors I saw last time testing on Windows a couple weeks ago. Joe - - - - - - Joseph Percivall linkedin.com/in/Percivall e: joeperciv...@yahoo.com On Sunday, December 6, 2015 9:05 PM, Tony Kurc wrote: I've gotten confirmation of CentOS 7.1.1503 x86_64, Oracle JDK 8u66 working fine. and Fedora 23 not working with the same error that Andre reported. On Sun, Dec 6, 2015 at 5:50 PM, Tony Kurc wrote: I'll also try it on windows 10 (again x64_64) On Sun, Dec 6, 2015 at 5:36 PM, wrote: I can run it on Windows 8 tonight if no one else has. Joe Sent from my phone On Dec 6, 2015, at 4:09 PM, Tony Kurc wrote: Signatures and hashes look good. Built fine on Ubuntu 14.04 x86_64. I even cursed a little bit less at TestJdbcHugeStream! LICENSE, NOTICE and README look good. Docs look good. Binary ran successfully. +1 Did anyone try building on windows? On Sat, Dec 5, 2015 at 11:46 PM, Aldrin Piri < aldrinp...@gmail.com> wrote: Followed helper provided by Joe. Keys good. Signatures good. Hashes good. Source release builds and passes contrib Required docs present and look correct. Checked out copy of repo for specified commit hash and diff'd against source bundle. Commit is as anticipated. Ran convenience binary with varying templates all successfully. Release notes and upgrade/migration guides look good. Kudos to the community on all the efforts involved with this release. +1, Release this package as Apache NiFi 0.4.0 On Sat, Dec 5, 2015 at 10:32 PM, Joe Witt wrote: Hello NiFi Community, I am pleased to be calling this vote for the source release of Apache NiFi 0.4.0. The source zip, including signatures, digests, and associated convenience binaries can be found at: https://dist.apache.org/repos/dist/dev/nifi/nifi-0.4.0/ The Git tag is nifi-0.4.0-RC1 The Git commit ID is 191a56f54e3ec178f9f29e1287f23ba66dbf9e43 https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=191a56f54e3ec178f9f29e1287f23ba66dbf9e43 Checksums of NiFi 0.4.0 Source Release: MD5: b69fd7ec632d7569906e20508058556b SHA1: 31d88ec7a8431ba5935370eb09be7a343c46411c Release artifacts are signed with the following key: https://people.apache.org/keys/committer/joewitt.asc KEYS file available here: https://dist.apache.org/repos/dist/release/nifi/KEYS 152 issues were closed/resolved for this release: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12333070 Release note highlights: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.4.0 Migration/Upgrade guidance: https://cwiki.apache.org/confluence/display/NIFI/Migration+Guidance https://cwiki.apache.org/confluence/display/NIFI/Upgrading+NiFi The vote will be open for 72 hours. Please download the release candidate and evaluate the necessary items including checking hashes, signatures, build from source, and test. Then please vote: [ ] +1 Release this package as Apache NiFi 0.4.0 [ ] +0 no opinion [ ] -1 Do not release this package because... -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [ANNOUNCE] New Apache NiFi Committer Ricky Saltzer
Nice job, Ricky! On 10/21/2015 01:04 PM, Ricky Saltzer wrote: I am very excited for the opportunity and can't wait to help drive this project forward with the rest of you! Ricky On Wed, Oct 21, 2015 at 3:05 PM, dan bress wrote: Welcome Ricky! Thanks for your contributions and I look forward to you pushing Apache NiFi forward! On Wed, Oct 21, 2015 at 3:04 PM Tony Kurc wrote: NiFi Community! Great news! On behalf of the Apache NiFI PMC, I am very pleased to announce that Ricky Saltzer has accepted the PMC's invitation to become a committer on the Apache NiFi project. We greatly appreciate all of Ricky's hard work and generous contributions to the project. We look forward to his continued involvement in the project. Welcome Ricky, and congratulations! Tony -- Ryan Blue Software Engineer Cloudera, Inc.
Re: Source code for Version 0.3.0
+1 for a nifi-0.3.0 release tag. Signed is even better, but I don't think I'd mind if it weren't signed. rb On 09/21/2015 06:35 AM, Sean Busbey wrote: The pattern I've liked the most on other projects is to create a proper release tag, signed by the RM on passage of the release vote. I don't recall off-hand what the phrasing was in the VOTE thread (if any). On Mon, Sep 21, 2015 at 8:13 AM, Adam Taft wrote: What's the thoughts on creating a proper 0.3.0 tag, as would be traditional for a final release? It is arguably a little confusing to only have the RC tags, when looking for the final release. I found this head scratching for 0.2.0 as well. Adam -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [VOTE] Release Apache NiFi 0.3.0
+1 (non-binding) * Checked build signature, checksums * Built nifi-0.3.0 * Ran rat checks rb On 09/16/2015 02:05 AM, Jennifer Barnabee wrote: Everything checked out for me. +1 (binding) - Release this package as nifi-0.3.0 -Jenn On Mon, Sep 14, 2015 at 11:13 PM, Matt Gilman wrote: Hello I am pleased to be calling this vote for the source release of Apache NiFi 0.3.0. The source zip, including signatures, digests, etc. can be found at: https://repository.apache.org/content/repositories/orgapachenifi-1060 The Git tag is nifi-0.3.0-RC1 The Git commit ID is 2ec735e35025fed3c63d51128ec0609ffe1fa7e3 https://git-wip-us.apache.org/repos/asf?p=nifi.git;a=commit;h=2ec735e35025fed3c63d51128ec0609ffe1fa7e3 Checksums of nifi-0.3.0-source-release.zip: MD5: 0bca350d5d6d9c9a459304253b8121c4 SHA1: 4b14bf1c0ddc3d970ef44dac93e716e9e6964842 Release artifacts are signed with the following key: https://people.apache.org/keys/committer/mcgilman.asc KEYS file available here: https://dist.apache.org/repos/dist/release/nifi/KEYS 89 issue was closed/resolved for this release: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12329653 Release note highlights can be found here: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version0.3.0 The vote will be open for 72 hours. Please download the release candidate and evaluate the necessary items including checking hashes, signatures, build from source, and test. The please vote: [ ] +1 Release this package as nifi-0.3.0 [ ] +0 no opinion [ ] -1 Do not release this package because... -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [DISCUSS] Feature proposal: First-class Avro Support
On 08/12/2015 06:09 PM, Bryan Bende wrote: All, Given how popular Avro has become, I'm very interested in making progress on providing first-class support with in NiFi. I took a stab at filling in some of the requirements on the Feature Proposal Wiki page [1] and wanted to get feedback from everyone to see if these ideas are headed in the right direction. Are there any major features missing from that list? any other recommendations? I'm also proposing that we create a new Avro bundle to capture the functionality that is decided upon, and we can consider whether any of the existing Avro-specific functionality in the Kite bundle could eventually move to the Avro bundle. If anyone feels strongly about this, or has an alternative recommendation, let us know. [1] https://cwiki.apache.org/confluence/display/NIFI/First-class+Avro+Support Thanks, Bryan Thanks for putting this together, Bryan! I have a few thoughts and observations about the proposal: * Conversion to Avro is an easier problem than conversion from Avro. Item #2 is to convert from Avro to other formats like CSV, but that isn't possible for some Avro schemas. For example, Avro supports nested lists and maps that have no good representation in CSV so we'll have to be careful about that conversion. It is possible for a lot of data and is definitely valuable, though. * For #3, converting Avro records, I'd also like to see the addition of transformation expressions. For example, I might have a timestamp in seconds that I need to convert to the Avro timestamp-millis type by multiplying the value by 1000. * There are a few systems like Flume that use Avro serialization for individual records, without the Avro file container. This complicates behavior a bit. Your suggestion to have merge/split is great, but we should plan on having a couple of scenarios for it: - Merge/split between files and bare records with schema header - Merge/split Avro files to produce different sized files * The "extract fingerprint" processor could be more general and populate a few fields from the Avro header: - Schema definition (full, not fp) - Schema fingerprint - Schema root record name (if schema is a record) - Key/value metadata, like compression codec * It looks like #7, evaluate paths, and #8, update records, are intended for the case where the content is a bare Avro record. I'm not sure that evaluating paths would work for Avro files. * For the update records processor, this is really similar to the processor to convert between Avro schemas, #3. I suggest merging the two and making it easy to work with either a file or a record via record-level callback. This would be useful elsewhere as well. Maybe tell the difference between file and record by checking for the filename attribute? On the subject of where these processors go, I'm not attached to them being in the Kite bundle. It would probably be better to separate that out. However, there are some specific features in the Kite bundle that I think are really valuable: - Use a schema file from a HDFS path (requires Hadoop config) - Use the current schema of a dataset/table Those make it possible to update a table schema, then have that change propagate to the conversion in NiFi. So if I start receiving a new field in my JSON data, I just update a table definition and then the processor picks up the change either automatically or with a restart. The other complication is that the libraries for reading JSON and CSV (and from an InputFormat if you are interested) are in Kite, so you'll have a Kite dependency either way. We can look at separating the support into stand-alone Kite modules or moving it into the upstream Avro project. Overall, this looks like a great addition! rb -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [DISCUSS] Removal of the 'master' vs 'develop' distinction
+1 to removing the distinction. Master is the default branch in a lot of projects and I would argue that is the common expectation. It sounds like we can do gitflow without a separate develop branch (or at least it isn't too painful) so doing what new people tend to expect is a good thing. rb On 08/13/2015 12:55 PM, Mark Payne wrote: I think the issue here is less about gitflow being "hard" and more about it being confusing. We have had numerous people write to the dev list about why the thing that they checked out doesn't have what they expect. Even being very experience with NiFi, I've cloned the repo a couple of times to new VM's and forgotten to checkout develop before proceeding. I think that gitflow has its merits, but like any other avenue we go down, it's important to weigh pros against cons. Frankly, I think that anything that leads to confusion for newcomers (thereby discouraging community growth) had better have some very strong benefits. That being said, I don't personally see a lot of benefit in this environment, so I would be a +1 to remove the distinction between the two branches. Date: Thu, 13 Aug 2015 15:45:00 -0400 Subject: Re: [DISCUSS] Removal of the 'master' vs 'develop' distinction From: a...@adamtaft.com To: dev@nifi.apache.org The difficulties of using the gitflow workflow and the release process can be significantly reduced with good tooling. I'm currently using the jgit-flow [1][2] maven plugin with very good success. It handles and manages feature, release, and hotfix branches seemlessly. And it avoids common problems with the normal maven release plugin for gitflow. Before abandoning gitflow, the community should seriously consider tooling that makes it more usable. I'm not going to argue the merits of gitlab flow or any other workflows. But clearly, abandoning gitflow because it's "hard" is likely the wrong driver, if tooling exists to make it better. [1] http://blogs.atlassian.com/2013/05/maven-git-flow-plugin-for-better-releases/ [2] https://bitbucket.org/atlassian/jgit-flow/wiki/Home On Thu, Aug 13, 2015 at 2:58 PM, Bryan Bende wrote: If we worked on master and had a prod branch that was the last release, then we have the same thing we do now, just with different names. This would be GitLab Flow as Brandon pointed out. That being said, I don't have experience with the release process, and maybe the prod branch does not provide any value for us. The prod branch would normally be used to create quick fix branches based off production, or when doing automated/continuous deployments to a production system, but if we aren't doing either of those things then maybe it is not worth it. -Bryan On Thu, Aug 13, 2015 at 2:23 PM, Brandon DeVries wrote: Personally, I still think GitLab Flow[1] is all we need for us to be Really Useful Engines. [1] https://about.gitlab.com/2014/09/29/gitlab-flow/ Brandon On Thu, Aug 13, 2015 at 2:15 PM Joe Witt wrote: Resending On Aug 13, 2015 12:22 PM, "Joe Witt" wrote: Team, It was proposed by Ryan Blue on another thread that we consider dropping the master vs develop distinction. In the interest of his, in my view, very good point I didn't want it to get buried in that thread. [1] is the thread when we last discussed gitflow/develop/master on entry to the incubator. And from that thread here is the part I wish I had better understood when the wise Mr Benson said it: "Another issue with gitflow is the master branch. The master branch is supposed to get merged to for releases. The maven-release-plugin won't do that, and the jgitflow plugin is unsafe. So one option is to 'use gitflow' but not bother with the master versus develop distinction, the other is to do manual merges to master at release points." I think we should follow this guidance: "'use gitflow' but not bother with the master versus develop distinction". I say this from having done the release management job now a couple of times including having done a 'hotfix'. My comments here are not a rejection of that master/develop concept in general. It is simply pointing out that for the Apache NiFi community it is not adding value but is creating confusion and delay [2]. Thanks Joe [1] http://s.apache.org/GIW [2] Sir Topham Hatt - Thomas and Friends (tm) -- Ryan Blue Software Engineer Cloudera, Inc.
Re: eliminate nifi-parent, split out nifi-nar-maven-plugin, have nifi in its own tree
What is the current distinction between master and develop? Master is stable and develop is where new changes go? The reason I suggest just having master is that it follows the convention that other projects use. Master is where new development happens and releases or more stable branches are marked appropriately. rb On 08/13/2015 08:46 AM, Joe Witt wrote: All, Am filing the infra tickets now. I forgot that we had 'nifi-site' at the root level too. So requesting two new git repositories in Apache Infra. Will not be asking to have them mirrored to Github as it doesn't seem worth it/necessary. 'nifi-maven' https://issues.apache.org/jira/browse/INFRA-10119 'nifi-site' https://issues.apache.org/jira/browse/INFRA-10120 Actions: Once these two new git repositories are created i will move the appropriate nifi-nar-maven-plugin items into it and terminate the current directory. Then I'll move the nifi-site directory content into the new nifi-site repository and then delete the directory. Once that is sorted we can discuss whether we care to keep develop/master or simply go to master as Ryan suggests. Thanks Joe On Mon, Aug 10, 2015 at 5:13 PM, Joe Witt wrote: Ryan Correct the latest code depends on latest nifi nar maven plugin. I would be absolutely fine personally with eliminating develop and just using master. Given that the releases are tagged i personally dont get the value here vs the extra work required. Anybody feel strongly for keeping master and dev as they are and if so can you please state how the current model has helped you contribute or how the proposed model would not? Thanks Joe On Aug 10, 2015 11:43 AM, "Ryan Blue" wrote: +1 I think separate git repos is a great idea. One thing to clarify, too: most of the time the nifi project relies on the last nifi-nar-maven-plugin release, right? So that should be transparent for most people building the project. It would only be awkward for someone updating the maven plugin and testing it out locally because the develop branch should always track a release. Speaking of the develop branch... what about using master like most projects after this change? rb On 08/10/2015 07:32 AM, Joe Witt wrote: Team, We've seen and heard the confusion of folks trying to build NiFi's goofy three step build process with parent, nar plugin, and nifi. I propose to do the following: 1) Eliminate the nifi-parent by pushing anything necessary back into nifi-nar-maven-plugin. The DRY concept is valid but just not worth a third project at this point given how little it avoids meaningful repetition on. 2) Create a new apache git repo for 'nifi-maven-plugins' and move the 'nifi-nar-maven-plugin' content into it. 3) Remove the nifi-parent and nifi-nar-maven-plugin from nifi folder and promote the current 'nifi' sub folder to the top level. Why: Folks are confused as to why they need to build all three and it is odd that in a given project folder you would have to each manually. It is just not a generally appreciated fact that you cannot have a dependency on a maven plugin within the same reactor build that uses that builds that plugin. By cleaning this up people can just download the source and build it. We don't have to have any protracted build cycles for 'nifi maven plugings' anymore leaving dependency on a snapshot in the nifi tree. If there seems to be consensus on this i'll put in the infra ticket soon. Thanks Joe -- Ryan Blue Software Engineer Cloudera, Inc. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [DISCUSS] Feature proposal: Read-only mode as default
+1 for the consensus view as Joe summarized it. I also agree with only using confirmations sparingly. rb On 08/11/2015 07:07 AM, Ricky Saltzer wrote: +1 for read-only by default. It would be nice to have some easy way to tell if you're in edit/view mode, perhaps the canvas be black/white during view and color during edit? or, something along those lines. On Tue, Aug 11, 2015 at 9:57 AM, Michael Moser wrote: "undo" seems to be the stretch goal that I think that would solve most concerns of unintended modifications to a graph. +1 Meanwhile, I'd like to caution against confirmation dialogs. Extra clicks quickly annoy users while they work. I suggest no dialog when deleting a single queue or processor, or when moving 'objects' around. Perhaps bring a confirmation dialog into play only when deleting more than 1 'object'. Personally I really like the idea of a read-only mode toggle, even if it was not persisted as a user preference and was only remembered during the current browser 'session'. -- Mike On Tue, Aug 11, 2015 at 9:11 AM, Rob Moran wrote: The consensus view looks good to me. I believe preserving the current model as Joe describes it is a smart approach. An undo action and restrained use of confirmation dialogs are minimal and should not significantly impede experienced operators. More often than not, I'd bet a user would expect similar functionality. As is evident by the views expressed around read-only/locking, it will be very difficult to please a majority of users with different user modes and UI states. On Tue, Aug 11, 2015 at 8:29 AM Joe Witt wrote: To summarize where we're at ... Proposed approaches (summary): - Establish a default read-only view whereby an operator can enable edit mode. Use confirmation dialogs for deletes. - Keep the current model but add support for ‘undo’. - Let the user choose whether to lock their view or not as user preference. - For delete add more protections to make accidents less likely and for movement provide an explicit ‘move action’. The idea of locking seems to have some strong views on both sides and both sides have even been argued by the same people (i now count myself among that group). It looks like a consensus view there though is: - Try to make panning the view of the flow and moving components on the flow two specific/discrete actions to avoid accidental movement. - Add support for undo - Provide sufficient dialog/protection for delete cases. This preserves the model whereby we generally trust that the user will do the right thing and we’ll do more to help them and that when they don’t they will learn and have help to promptly restore a good state. How do folks feel about that? Thanks Joe On Tue, Aug 11, 2015 at 5:11 AM, Alex Moundalexis wrote: Counterpoint, accidents happen; I prefer to enable users to learn from mistakes and exercise more care next time. You can't remove every mildly sharp edge without impacting experienced operators; resist the urge at a few comments to dumb down the tool. If some protection is added to the UI, suggest an option for "expert mode" that retains original functionality... that way experienced operators aren't affected. Alex Moundalexis Senior Solutions Architect Cloudera Partner Engineering Sent from a mobile device; please excuse brevity. On Aug 7, 2015 10:31 PM, "Joe Witt" wrote: Team, We've been hearing from users of nifi that it is too easy to accidentally move something on the flow or delete large portions of the flow. This is the case when panning the screen for example and accidentally having a processor selected instead. What is worth consideration then is the notion of making the flow 'read-only' by default. In this case the user would need to explicitly 'enable edit mode'. We would then also support a confirmation dialog or similar construct whenever deleting components on the flow. Anyone have a strong objection to this concept? If so, do you have an alternative in mind that would help avoid accidental movement? Thanks Joe -- Rob -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [DISCUSS] Feature proposal: Read-only mode as default
If we're talking about read-only mode as a way to avoid moving processors when moving around on the graph, what about implementing something slightly different? What about having a toggle between drag-to-scroll and drag-to-move? Then users could keep the toggle in drag-to-scroll most of the time (and undo would put things back when the inevitable accident happens). Then deletes would be handled by more sophisticated rules, like not deleting processors that are off the screen without confirmation. rb On 08/10/2015 12:58 PM, Dan Bress wrote: +1 to exactly what Mark described in his last email for a system wide preference. Although I'm curious how you leave read-only mode and get into edit mode? And how do you leave edit mode and go back to read-only mode? On one hand, if I do not intend to edit the graph and I accidentally move a processor I probably don't want it to prompt me "do you want to enter edit mode?" -In read-only mode, I think it would be a nice user experience to click anywhere on the graph(including on a processor) and it moves the entire graph. On the other hand, if I right click a processor and hit configure I'd like to leave read-only mode and go into edit mode. I'm not sure I'd even want to be prompted with "do you want to enter edit mode?" here since I obviously do. Dan Bress Software Engineer ONYX Consulting Services From: Mark Payne Sent: Monday, August 10, 2015 3:43 PM To: dev@nifi.apache.org Subject: RE: [DISCUSS] Feature proposal: Read-only mode as default I'm definitely a +1. I accidentally drag processors all the time when I'm panning around a large graph. I can understand how someone would get annoyed with this, though, and I can also appreciate the desire to not start storing user preferences. However, I think we should probably at least supply a system-level configuration for whether or not to have "read-only" the default mode. Then administrators can turn it on or off, depending on the users of the system. Date: Sat, 8 Aug 2015 20:50:07 -0400 Subject: Re: [DISCUSS] Feature proposal: Read-only mode as default From: a...@adamtaft.com To: dev@nifi.apache.org Thinking about it some more, I don't mind the concept of "read only" toggle. Maybe it's not set by default, but it could be a really easy UI element to add somewhere. Just a little slider-toggle element. [1] In theory, this might be a UI only feature, it wouldn't strictly need support in the backend api (just guessing). The logic is seemingly already there, similar user experience as non-DFMs. Anyway, +1 to the concept of read-only toggle mode. -1 for making it default, but the user interface element should be easy to work either way. Also agree that undo support might be free if versioning is added. [1] http://turbo.premiumpixels.com/wp-content/uploads/2011/04/preview2.jpg On Sat, Aug 8, 2015 at 3:05 PM, Joe Witt wrote: Ryan - the other useful case for read-only is basically when is scanning around the graph and accidentally moves a processor or relationship. By no means a big deal. The idea here was to make it explicit though that the user wishes to go into an edit mode. I do think the undo mechanism plays well and you're right that we can just focus on tightening up the delete case. Sounds like the prevailing view is to avoid read-only as a mode but rather to make it more explicit whenever we delete - and potentially move we could make more specific rather than simply them having clicked and dragged which is ambiguous with the process of panning. On Sat, Aug 8, 2015 at 2:57 PM, Ryan Blue wrote: I'm not a big fan of having a read-only mode by default. It sounds like something that would be frustrating for users when they try to make changes and then have to figure out how to switch modes. I think a clearer picture of the problem we're trying to solve would help my understanding. I'm primarily thinking of the delete case you mentioned with these comments... If we're talking about accidentally deleting processors, then the current mechanisms (IIRC) work pretty well: not deleting a running processor, one that has live incoming connections, etc. If those rules are insufficient, I would explore extending them rather than having a global read-only mode. For the case where the wrong processor is selected because it is off the screen, maybe having the confirmation pop up if anything affected wasn't displayed to confirm? That way we don't have confirmations all the time but still don't do unexpected things. I really like the idea of "undo" as well. If that is limited to processors that weren't running (because you can't delete ones that are), then that makes the undo operation easier to implement. rb On 08/08/2015 11:31 AM, J
Re: eliminate nifi-parent, split out nifi-nar-maven-plugin, have nifi in its own tree
+1 I think separate git repos is a great idea. One thing to clarify, too: most of the time the nifi project relies on the last nifi-nar-maven-plugin release, right? So that should be transparent for most people building the project. It would only be awkward for someone updating the maven plugin and testing it out locally because the develop branch should always track a release. Speaking of the develop branch... what about using master like most projects after this change? rb On 08/10/2015 07:32 AM, Joe Witt wrote: Team, We've seen and heard the confusion of folks trying to build NiFi's goofy three step build process with parent, nar plugin, and nifi. I propose to do the following: 1) Eliminate the nifi-parent by pushing anything necessary back into nifi-nar-maven-plugin. The DRY concept is valid but just not worth a third project at this point given how little it avoids meaningful repetition on. 2) Create a new apache git repo for 'nifi-maven-plugins' and move the 'nifi-nar-maven-plugin' content into it. 3) Remove the nifi-parent and nifi-nar-maven-plugin from nifi folder and promote the current 'nifi' sub folder to the top level. Why: Folks are confused as to why they need to build all three and it is odd that in a given project folder you would have to each manually. It is just not a generally appreciated fact that you cannot have a dependency on a maven plugin within the same reactor build that uses that builds that plugin. By cleaning this up people can just download the source and build it. We don't have to have any protracted build cycles for 'nifi maven plugings' anymore leaving dependency on a snapshot in the nifi tree. If there seems to be consensus on this i'll put in the infra ticket soon. Thanks Joe -- Ryan Blue Software Engineer Cloudera, Inc.
Re: [DISCUSS] Feature proposal: Read-only mode as default
I'm not a big fan of having a read-only mode by default. It sounds like something that would be frustrating for users when they try to make changes and then have to figure out how to switch modes. I think a clearer picture of the problem we're trying to solve would help my understanding. I'm primarily thinking of the delete case you mentioned with these comments... If we're talking about accidentally deleting processors, then the current mechanisms (IIRC) work pretty well: not deleting a running processor, one that has live incoming connections, etc. If those rules are insufficient, I would explore extending them rather than having a global read-only mode. For the case where the wrong processor is selected because it is off the screen, maybe having the confirmation pop up if anything affected wasn't displayed to confirm? That way we don't have confirmations all the time but still don't do unexpected things. I really like the idea of "undo" as well. If that is limited to processors that weren't running (because you can't delete ones that are), then that makes the undo operation easier to implement. rb On 08/08/2015 11:31 AM, Joe Witt wrote: I can dig the user pref aspect but it would mean we start storing user prefs which is a bummer. On Aug 8, 2015 1:42 PM, "Tony Kurc" wrote: Personally not a fan of the idea. Maybe something analogous to something like 'lock the taskbar' in Windows that can have a system default setting and a user preference of on or off. On Sat, Aug 8, 2015 at 11:44 AM, johny casanova < computertech2...@gmail.com> wrote: I agree it is easy to move it delete something by mistake. Some flows are huge or are using,more resources and are slower to load and you can accidently do something by mistake. I believe the "are yous sure u want to delete?" its a good start. On Aug 7, 2015 10:31 PM, "Joe Witt" wrote: Team, We've been hearing from users of nifi that it is too easy to accidentally move something on the flow or delete large portions of the flow. This is the case when panning the screen for example and accidentally having a processor selected instead. What is worth consideration then is the notion of making the flow 'read-only' by default. In this case the user would need to explicitly 'enable edit mode'. We would then also support a confirmation dialog or similar construct whenever deleting components on the flow. Anyone have a strong objection to this concept? If so, do you have an alternative in mind that would help avoid accidental movement? Thanks Joe -- Ryan Blue Software Engineer Cloudera, Inc.
Re: Write-Ahead-Log package name change?
The change sounds pretty safe to me and I wouldn't expect the WAL to be public. I agree that the public API needs to be well defined, though, because that's really how this should be decided. rb On 08/05/2015 01:01 PM, Joe Witt wrote: I am too. And I think we should document precisely what is public and what is private across the entire codebase. On Wed, Aug 5, 2015 at 3:02 PM, Dan Bress wrote: I'm fine with the package name being changed in 0.3.0 Dan Bress Software Engineer ONYX Consulting Services From: Mark Payne Sent: Wednesday, August 5, 2015 3:01 PM To: dev@nifi.apache.org Subject: RE: Write-Ahead-Log package name change? Ryan, The WAL is certainly not defined in the nifi-api. But it does live in the nifi-commons module. Not entirely sure if i would consider it "public" or not. My suggestion is to change the package name for the 0.3.0 release, which is a minor version. Thanks -Mark Date: Wed, 5 Aug 2015 11:38:02 -0700 From: b...@cloudera.com To: dev@nifi.apache.org Subject: Re: Write-Ahead-Log package name change? Is the WAL a public API? I thought that it was internal, in which case a rename should be fine. Otherwise we would have to bump the major version number (or minor depending on discussion) to account for the change. rb On 08/03/2015 11:53 AM, Mark Payne wrote: Hello, I recently realized that the nifi-write-ahead-log module (under nifi-commons) is using a package name of "org.wali" instead of "org.apache.nifi.wal" This has been the package name since the software was open sourced, unfortunately. I would like to change the package name for the 0.3.0 version of NiFi, if there are no objections. The pre-0.3.0 versions would, of course, still be available if anyone has a dependency on the classes, but I would like to get this fixed so that it is correct going forward. Is there any reason that we cannot change this for the 0.3.0 release? Thanks -Mark -- Ryan Blue Software Engineer Cloudera, Inc. -- Ryan Blue Software Engineer Cloudera, Inc.
Re: Write-Ahead-Log package name change?
Is the WAL a public API? I thought that it was internal, in which case a rename should be fine. Otherwise we would have to bump the major version number (or minor depending on discussion) to account for the change. rb On 08/03/2015 11:53 AM, Mark Payne wrote: Hello, I recently realized that the nifi-write-ahead-log module (under nifi-commons) is using a package name of "org.wali" instead of "org.apache.nifi.wal" This has been the package name since the software was open sourced, unfortunately. I would like to change the package name for the 0.3.0 version of NiFi, if there are no objections. The pre-0.3.0 versions would, of course, still be available if anyone has a dependency on the classes, but I would like to get this fixed so that it is correct going forward. Is there any reason that we cannot change this for the 0.3.0 release? Thanks -Mark -- Ryan Blue Software Engineer Cloudera, Inc.