Fw: Apache JIRA vs. github "issues"

2021-09-25 Thread Beckerle, Mike
FYI: a pretty interesting going discussion on users@infra that I am on a CC 
trail about using github issues vs. Apache JIRA, and the experience of Apache 
Airflow in switching to GH.


From: Jarek Potiuk 
Sent: Saturday, September 25, 2021 7:13 AM
To: Juan Pablo Santos Rodríguez 
Cc: Beckerle, Mike ; us...@infra.apache.org 

Subject: Re: Apache JIRA vs. github "issues"

* Did you export / import the jira issues to github?

We initially thought about that and even started doing it, but eventually we 
decided not to do that and "leave" the JIRA issues behind. We moved some 
"important" ones and then we informed everyone and asked for help with that in 
devlist/userlist/slack etc. that if they are still interested in their issue - 
they can copy them over. And we keep info about it in our README for quite a 
while.  A lot of people did.

I personally think this is a really great way to engage with the community and 
ask them to help. We have to remember that as committers and PMC members we do 
not have to do everything ourselves - we can always reach out to our community 
for help. And it worked really nicely. Those authors of issues who did not do 
this were apparently not interested any more, or maybe they did not follow the 
issues they created, or maybe the issues were gone already (or even if they 
were real issues there was no-one to verify them) so we let the issues "rot" 
there.

That was a very good choice. A lot of issues we had in jira were already 
out-dated or of poor quality, so that automatically cleaned up the state of our 
issues. I personally think that if it is not obvious that an issue is really 
important and if the author of the issue is not interested in adding extra 
information if asked or if they are not following  up with them - they are 
better if they are "forgotten". They add no value to the project, they only add 
"noise". This is why I love GitHub discussions so much.  We can convert the 
issue to GitHub Discussion if we look at it and it is likely the issue is 
caused by user error, deployment issue etc. This does not "close" the issue 
(which is quite mean) - but it moves the "responsibility" for the discussion to 
continue on the author - it's a very clear sign that the discussion might be 
left in the state of "discussing it" and there is no intention or expectation 
that it will be fixed. And we can always create an issue from the discussion if 
we get to the conclusion this is a real issue. This already happened in the 
past.

** if so, how? I've found several articles/projects ([#1], [#2], [#3], [#4], 
[#5]) but they all seem to be customized to specific projects needs..

See above. We crowdsourced it by asking the authors to move the issues to GH 
:D. Not a "tool", but it was a great choice for user engagement, community 
building, etc.

** how an issue assigned to several fix versions is translated to gh issues? 
Was any markdown conversion between jira and gh done (issue descriptions, 
comments)?

See above. ^^ :).

** If not, how do you handle the issues on the jira side?

We just closed JIRA issues for entry and I think we left a comment in CWIKI 
space which we used much more then, that the GH issues are now being used.

* How do you deal with security reports inside github issues?

We have those really nice templates for GitHub Issues as of recently (this is 
another benefit of GH Issues - they have those really nicely working Issue 
Forms - which do a FANTASTIC job to make our issues much more quality issues - 
for example in the forms we instruct the users that if they have no 
reproducible steps, they should open GitHub Discussion instead - this already 
happened multiple times). One of the options in the issue form configuration is 
to provide a "BUTTON" instead of form for some types of issues which link to an 
external site. We have a link there to the security pollicy 
https://github.com/apache/airflow/security/policy  which clearly states that no 
GH issues should be opened, but the regular ASF security process should be 
followed (with the email to securty@a.o).

I HEARTILY recommend to introduce well thought and prepared issue forms when 
you move to GH issues. We already see tremendous improvement in the quality of 
reported issues, and a lot more GitHub discussions opened up instead of issues. 
The nice things about those forms is that they introduce a bit of "friction". 
It's not just copy or type your frustration - you HAVE TO choose version 
of Airflow, you HAVE TO describe your OS, you HAVE TO choose deployment - and 
if you did not respond to reproducibility steps, there is a clear "No response 
was given to that" in your issue which in VAST majority of cases immediately 
qualifies the issue to be converted to discussion (which we often do) - 
especially that during issue entry we e

Re: github issues vs. JIRA for VSCode Debugger - Fw: Apache JIRA vs. github "issues"

2021-09-23 Thread Beckerle, Mike
(sorry if you get this twice. A few email difficulties of late. I'm switching 
email systems for Apache email.)

I think we have a pretty good consensus that we should go with github
issues for VSCode debugger, and I think steve lawrence's point that we can
view this as a trial and maybe migrate regular daffodil to it eventually is
well taken also.

I do use JIRA's reports sometimes. Like the open-close graph they have
which shows trend of opening vs. closing tickets. Github may or may not
have that sort of reporting. But it's not critical.

On Thu, Sep 23, 2021 at 11:32 AM Interrante, John A (GE Research, US) <
john.interra...@ge.com> wrote:

> +1 for using GitHub issues in Apache's VSCode Debugger repo.
>
> -Original Message-
> From: Beckerle, Mike 
> Sent: Thursday, September 23, 2021 10:30 AM
> To: dev@daffodil.apache.org
> Subject: EXT: github issues vs. JIRA for VSCode Debugger - Fw: Apache JIRA
> vs. github "issues"
>
> So, I inquired about whether we need to use JIRA, or can just use github
> issues.
>
> I got a reply basically saying we can do what we prefer. (reply is below.
> Apache Airflow uses github issues)
>
> The regular Apache Daffodil repo has a pretty big investment in using
> JIRA. I'm not suggesting we consider switching that.
>
> For VSCode, we could stick with using JIRA, but that would mix its issues
> into the ~390 other Apache Daffodil issues.
>
> There are pros and cons to this.
>
> I am wondering if for the VSCode repo (once established), we should just
> use github issues instead.
>
> Thoughts?
>
> -mikeb
>
> ____
> From: Jarek Potiuk 
> Sent: Thursday, September 23, 2021 10:21 AM
> To: Beckerle, Mike 
> Cc: us...@infra.apache.org 
> Subject: Re: Apache JIRA vs. github "issues"
>
> It's quite OK to only use Github Issues/Discussions - we switched to GH in
> Apache Airflow ~ 2 years ago I think.
>
> And a comment from our perspective of a big project that uses GitHub
> Issues at its inception, switched to JIRA and finally returned back to
> GitHub issues when they matured. Others might have different experience but
> this is ours (and I am pretty sure I am representing view of pretty much
> whole Airflow community).
>
> I witnessed just the last switch - from JIRA to GitHub. We stopped using
> JIRA in Apache Airflow in favour of GitHub Issues and Discussions and we
> NEVER looked back. Not a minute. Not even a second. Absolutely no-one
> missed JIRA. Not by far.
>
> That was such an amazing improvement in the overall workflow and
> contributor's engagement. I don't even imagine how we would be able to run
> the project with JIRA.
>
> The overall experience, integration level, overhead needed to manage JIRA
> issues, dual-logging-in and syncing between the two were absolutely
> unmanageable for us. With GitHub Issues we chose to base our "change
> tracking" based on PR# rather than Issue # optional and it made a whole
> world of difference.
>
> Especially recently with GithubDiscussions added to the mix and ability to
> convert issues into discussions (and back) if they are not real issues.
>
> J.
>
>
> On Thu, Sep 23, 2021 at 4:01 PM Beckerle, Mike <
> mbecke...@owlcyberdefense.com<mailto:mbecke...@owlcyberdefense.com>>
> wrote:
> I read a old blog post from infra about increasing github integration.
>
> I am wondering about Apache JIRA, vs. using the issues feature of github,
> for an Apache project repo.
>
> Can we use github's issues feature, or do we have to use Apache's JIRA? Is
> there a policy, or even strong preference on this issue?
>
> Thanks
>
> Mike Beckerle
> Apache Daffodil PMC
>
>
>
>


github issues vs. JIRA for VSCode Debugger - Fw: Apache JIRA vs. github "issues"

2021-09-23 Thread Beckerle, Mike
So, I inquired about whether we need to use JIRA, or can just use github issues.

I got a reply basically saying we can do what we prefer. (reply is below. 
Apache Airflow uses github issues)

The regular Apache Daffodil repo has a pretty big investment in using JIRA. I'm 
not suggesting we consider switching that.

For VSCode, we could stick with using JIRA, but that would mix its issues into 
the ~390 other Apache Daffodil issues.

There are pros and cons to this.

I am wondering if for the VSCode repo (once established), we should just use 
github issues instead.

Thoughts?

-mikeb


From: Jarek Potiuk 
Sent: Thursday, September 23, 2021 10:21 AM
To: Beckerle, Mike 
Cc: us...@infra.apache.org 
Subject: Re: Apache JIRA vs. github "issues"

It's quite OK to only use Github Issues/Discussions - we switched to GH in 
Apache Airflow ~ 2 years ago I think.

And a comment from our perspective of a big project that uses GitHub Issues at 
its inception, switched to JIRA and finally returned back to GitHub issues when 
they matured. Others might have different experience but this is ours (and I am 
pretty sure I am representing view of pretty much whole Airflow community).

I witnessed just the last switch - from JIRA to GitHub. We stopped using JIRA 
in Apache Airflow in favour of GitHub Issues and Discussions and we NEVER 
looked back. Not a minute. Not even a second. Absolutely no-one missed JIRA. 
Not by far.

That was such an amazing improvement in the overall workflow and contributor's 
engagement. I don't even imagine how we would be able to run the project with 
JIRA.

The overall experience, integration level, overhead needed to manage JIRA 
issues, dual-logging-in and syncing between the two were absolutely 
unmanageable for us. With GitHub Issues we chose to base our "change tracking" 
based on PR# rather than Issue # optional and it made a whole world of 
difference.

Especially recently with GithubDiscussions added to the mix and ability to 
convert issues into discussions (and back) if they are not real issues.

J.


On Thu, Sep 23, 2021 at 4:01 PM Beckerle, Mike 
mailto:mbecke...@owlcyberdefense.com>> wrote:
I read a old blog post from infra about increasing github integration.

I am wondering about Apache JIRA, vs. using the issues feature of github, for 
an Apache project repo.

Can we use github's issues feature, or do we have to use Apache's JIRA? Is 
there a policy, or even strong preference on this issue?

Thanks

Mike Beckerle
Apache Daffodil PMC





FYI: Apache emails blocked by Microsoft email spam filters

2021-09-23 Thread Beckerle, Mike
As you may have heard. Apache email and Microsoft's spam filtering systems 
don't mix:


There have been ongoing problems with Microsoft domains partly because many of 
their users report our legitimate email as spam.

The Infrastructure team have been trying to get the bans removed, but with no 
success. At present mails from one of the two ASF outbound servers are being 
rejected; i.e. on average 50% of mails will not be delivered.

The following Microsoft domains are all affected (as of August 2021):

 *   hotmail.com
 *   live.com
 *   outlook.com

I have now tried to setup specific mail rules, specific "safe senders" for 
apache.org, and nothing seems to prevent many apache emails a day from going 
into the Microsoft outlook Junk email folder.

At this point I have requested Apache INFRA to make this problem better known. 
The only page I can find discussing it requires an Apache sign-on.

The only workaround is not to use a Microsoft email system (e.g., my company 
email is Outlook 365) for Apache emails.

-mike beckerle
Apache Daffodil PMC



Mike Beckerle | Principal Engineer

[cid:8aabe1f0-0b1c-49d1-9bc8-3ad480e643a6]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Try Java 17 LTS

2021-09-22 Thread Beckerle, Mike
Java 17 is officially GA as of Sept 14. It is a Long-term-support (LTS) release.

I recommend developers download it and start using it in place of Java 16.

We need to officially support Java 8, 11, and 17 now.

If you are still using Java 8 or 11, I would also mention that when I switched 
to mostly using Java16, it was a noticeable performance increase which I expect 
will be sustained in Java 17. YMMV.


Mike Beckerle | Principal Engineer

[cid:c072756d-1cf3-4365-b14c-e883956157a3]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



verify licenses on dependencies for vscode debugger

2021-09-17 Thread Beckerle, Mike
I recall someone verifying the licenses on dependencies. I can't find that 
message now.

However, this must be a transitive verification, so there's quite a few.

The build.sbt has only:

  "ch.qos.logback" % "logback-classic" % "1.2.3",
  "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1",
  "co.fs2" %% "fs2-io" % "3.0.4",
  "com.monovore" %% "decline-effect" % "2.1.0",
  "org.typelevel" %% "log4cats-slf4j" % "2.1.0",

for the typescript code, I see a bunch in package.json.

Action Required: Can someone please verify the licenses of all the dependencies 
transitively and send me the list?

This is specifically what the IP Clearance checklist asks:

  Check and make sure that all items depended upon by the
  project is covered by one or more of the following approved
  licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or
  something with essentially the same terms.

I'd like the list of what we checked to include it in the IP Clearance 
checklist document.

Note: there used to be a sbt plugin that pulled all the license files 
recursively for sbt dependency chains. I recall we used, or attempted to use, 
it for daffodil at one time.





Re: Fwd: FW: DFDL: potential problem

2021-09-17 Thread Beckerle, Mike
Apologies on tardy reply. I missed parts of this thread due to spam email 
filter.

(I learned that MS Outlook 365 is misclassifying some Apache email as junk 
email. )

Here's the link to what is proposed for checksum calculations, and it has links 
to some mock-ups showing how this checksum/crc stuff is supposed to work.

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Checksums%2C+CRC%2C+Parity+-+Layering+Enhancements

I do think this could be used to couple a generic hash into data that is 
verified at unparse.



From: Steve Lawrence 
Sent: Monday, August 30, 2021 9:50 AM
To: dev@daffodil.apache.org 
Subject: Re: Fwd: FW: DFDL: potential problem

Interesting idea.

I was thinking you could do something like this once we have this new
feature implemented:

  

  

  
  
  

  http://www.ogf.org/dfdl/;>

  

  

  

So we parse and checksum the entire data foramt, add the checksum to the
infoset via input value calc, and then add an assert that the calculated
checksum matchs the value in the infoset.

On parse, these two should always be the same. But on unparse, it's
possible they could be different and the assert would fail.
Unfortunately, this doesn't actually work because assert's are evaluated
during unparse.

This seems like a reasonable use case for asserts during unparse, and I
imagine there are others, so maybe that's a feature worth considering to
allow this type of unparse validation.




On 8/25/21 9:20 AM, Attila Horvath wrote:
>
> *Subject:* DFDL: potential problem
>
> ALCON
>
> re: idea for checksum calculations in DFDL
> 
>
> We may have a potential ‘situation’ as part of our DFDL/Daffodil offering as
> follows…
>
> My DFDL schema development process consists of examining the exit codes of a
> four (4) part mechanism:
>
>  1. DFDL parsing – “Houston, we have a go.”
>  2. DFDL unparsing – “Houston, we have a go.”
>  3. *End-to-end source/destination data comparison – “Houston, we have a 
> problem.”*
>  4. Intermediate xml validation against reconstituted data – “Houston, we 
> have a
> go.”
>
> I have an *_unintentional_*error in my DFDL schema- unfortunately the
> data/schema is lost that created this situation. Per above, both parse and
> unparse execute successfully and xmllint validates Daffodil’s intermediate XML
> file successfully against the reconstituted/unparsed data as well against the
> DFDL [erroneous] schema.
>
> However, the source and target data are *_NOT_* congruent.This is one 
> situation
> I did not anticipate this situation.
>
> This means, our model and incorporation of Daffodil in our situation leaves
> [albeit] a /possibility/ to have an erroneous DFDL schema that will ultimately
> send data end-to-end but because the two [gateway]ends do not
> communicatedirectly w/ each other there is no way for the destination gateway 
> to
> verify if the data is identical w/ the data received by the source gateway.
>
> To address above and perhaps along the lines of 'checksum calculations' re: 
> IPV4
> element, what is the collective opinion of having a SHASUM capability added to
> Daffodil allowing the parser to optionally ("invisibly") incorporate a SHASUM 
> in
> the intermediate XML file allowing the destination unparser to validate the
> reconstitute the data against the incorporated SHASUM?
>
> Perhaps a lame suggestion, could Daffodil optionally insert a comment tag 
> while
> parsing identifying it as a Daffodil inserted shasum comment which the 
> unparser
> can identify and validate the reconstituted data.
>
> Thx in advance,
>
> v/r
>
> Attila
>
>



Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-16 Thread Beckerle, Mike
I think I have a good example for you to use.

The plc4x apache project played around with daffodil and created a DFDL schema 
for an industrial control protocol called s7.

So it's a not too difficult binary data format. But it is a real format.

Everything is already apache licensed, and at Owl we test it against every 
Daffodil release, so we know it continues to work.

tarball attached.



From: Beckerle, Mike 
Sent: Thursday, September 16, 2021 2:56 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

Suggest you just excise this file, and any tests that depend on it for now.

We can wire in a new example workspace and add in tests subsequently before the 
first "release" of this.






From: John Wass 
Sent: Thursday, September 16, 2021 10:48 AM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

> I know of one file in the repo which will have to be removed which is the
jpeg.dfdl.xsd file, which is there just as an example workspace.

I assume this issue remains, and needs to be addressed prior to giving this
the done stamp.

We could just remove that sample workspace, the setup is trivial and is
addressed in the docs, but that schema and jpg also exist for unit tests.

Looking through the test resources in Daffodil now, any suggestions on a
good candidate are welcomed.



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike 
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> 
> From: Steve Lawrence 
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org 
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > 
> > From: Adam Rosien 
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org 
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien  wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence 
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without 

Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-16 Thread Beckerle, Mike
Suggest you just excise this file, and any tests that depend on it for now.

We can wire in a new example workspace and add in tests subsequently before the 
first "release" of this.






From: John Wass 
Sent: Thursday, September 16, 2021 10:48 AM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

> I know of one file in the repo which will have to be removed which is the
jpeg.dfdl.xsd file, which is there just as an example workspace.

I assume this issue remains, and needs to be addressed prior to giving this
the done stamp.

We could just remove that sample workspace, the setup is trivial and is
addressed in the docs, but that schema and jpg also exist for unit tests.

Looking through the test resources in Daffodil now, any suggestions on a
good candidate are welcomed.



On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike 
wrote:

> I know of one file in the repo which will have to be removed which is the
> jpeg.dfdl.xsd file, which is there just as an example workspace.
>
> The copyright and provisions of that are not compatible with Apache
> licensing.
>
> We can find a DFDL schema that we created that has Apache license to use
> instead.
>
> For the other files under src, server, and build, can we generate a list
> of files identifying which are:
>
> (a) original MIT-licensed, unmodified
> (b) new - can be ASL
> (c) blended - started from MIT-licensed source, modified with
> daffodil-vscode-specific changes.
>
> It is these blended files that are the problematic ones.
>
>
>
> 
> From: Steve Lawrence 
> Sent: Thursday, September 9, 2021 1:38 PM
> To: dev@daffodil.apache.org 
> Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
>
> Correct. For more information about Apache license compatibility:
>
>   https://www.apache.org/legal/resolved.html
>
> MIT is Category A and is fine. EPL is Category B and is also okay, but
> generally only in its binary form. So these top-level dependencies look
> okay, assuming their transitive dependencies are also okay.
>
> We'll also need to verify the licenses of all code in the repo.
> Hopefully little of that is original microsoft MIT and can be granted to
> ASF and relicensed.
>
>
> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> > The requirement, is that the entire dependency tree (transitively)
> cannot depend on any software that has an Apache-incompatible (aka
> restrictive) license.
> >
> > So we need the transitive closure of all dependencies.
> >
> >
> > 
> > From: Adam Rosien 
> > Sent: Thursday, September 9, 2021 12:44 PM
> > To: dev@daffodil.apache.org 
> > Subject: Re: daffodil-vscode - how to package and identify the
> contribution - some git questions
> >
> > (I don't understand the requirements of licencing + transitive
> > dependencies, so I'm giving some surface level license info)
> >
> > "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> > http://logback.qos.ch/license.html
> > "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
> 1.0
> > "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> > "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> > "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
> >
> > On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien  wrote:
> >
> >> I can relay the list of dependencies and their licenses.
> >>
> >> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence 
> >> wrote:
> >>
> >>> I personally don't care too much about having the existing git history
> >>> once its part of ASF, especially if it makes thing any easier (as you
> >>> mention, squash/rebase can be difficut through merges). So I'd say we
> >>> just do plan B--create a tarball of the current state (without the git
> >>> history), and the content of that tarball is what goes through the IP
> >>> clearance process, and is the content of the inital commit when adding
> >>> to the apache/daffodil-vscode repo.
> >>>
> >>> Note that I think the incubator will still want access to the existing
> >>> repo so they can view the full git history. Understanding where
> >>> everything came from and verifying the provenance is important to
> >>> ensuring we have all the appropriate CLA's. So while the tarball is
> >>> maybe

Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-10 Thread Beckerle, Mike
How hard is it to refactor these 6 files so that all new code is in separate 
files from all preserved original code?

Assume one-liner changes to original files (like calling MockDebugger changed 
to call DaffodilDebugger) are allowed.

We either have to separate these 6 blended files, or convince legal and the 
incubator-pmc that blended files are ok because they originally had the MIT 
license.

I definitely don't want to bother with that unless the refactoring exercise 
here is hard.

From: John Wass 
Sent: Friday, September 10, 2021 1:02 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

Mike - Those were renames from the original versions that had "mock" in
their names.

commit 383fd4882a8fe51adf21b5ae31fe252056800447

On Fri, Sep 10, 2021 at 12:54 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

>
> John Wass said:
>
> I had a few more (6) source files as modified..
>
> extension.ts
> debugAdapter.ts
> daffodilRuntime.ts
> daffodilDebug.ts
> adapter.test.ts
> activateDaffodilDebug.ts
>
> The 3 files with daffodil or Daffodil in their names, aren't those new
> files? Or were those based on provided files, but the file was renamed as
> well as the content modified?
>
> ...mikeb
>
>


Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-10 Thread Beckerle, Mike

John Wass said:

I had a few more (6) source files as modified..

extension.ts
debugAdapter.ts
daffodilRuntime.ts
daffodilDebug.ts
adapter.test.ts
activateDaffodilDebug.ts

The 3 files with daffodil or Daffodil in their names, aren't those new files? 
Or were those based on provided files, but the file was renamed as well as the 
content modified?

...mikeb



Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-09 Thread Beckerle, Mike
So via some git trickery I was able to determine the "blended" files.

I'm ignoring the various configuration files which are generally json files.

Of the ".ts" files only 3 are blended:

src/debugAdapter.ts - 72 lines - only maybe 6 lines are different
src/extension.ts - 179 lines
src/tests/adapter.test.ts - 137 lines (50 of which are commented-out code)

The delta between these files and the original files of the same name are 
larger than expected due to changes in whitespace, and removal of ";" at end of 
line (which I guess are optional in many places in typescript).

It would seem an IDE (probably vscode!) decided to restyle/reindent this code.

So it's a bit hard to figure out what the "real" deltas are.

src/debugAdapter.ts appears to be only trivially different. The name 
MockDebugSession was replaced by DaffodilDebugSession, and "./mockDebug" was 
changed to "./daffodilDebug".

The other two files do appear to be where all the real blended code is.




From: Beckerle, Mike 
Sent: Thursday, September 9, 2021 4:21 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

Whether it's a PR or series of PRs, or a software grant, that still doesn't 
resolve the issue of the blended files which are part MIT-licensed original 
code, and part new code deltas by the daffodil-vscode contributors.

We need to understand whether those blended files can be teased apart somehow 
so that it is clear going forward what is an MIT-licensed library and what is 
Apache Licensed.

I just did a grep -R -i microsoft  in a clone of the openwhisk-vscode-extension 
and got zero hits. So no files still carry microsoft copyright and in fact 
their NOTICES.txt file does not indicate any dependency on MIT-licensed code at 
all.  So I think openwhisk-vscode-extension is not going to help us figure out 
how to surf this issue.



From: Steve Lawrence 
Sent: Thursday, September 9, 2021 3:54 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

The concern is that this code was developed outside of Apache and so
didn't follow standard Apache process. From the IP clearance page:

https://incubator.apache.org/ip-clearance/

> Any code that was developed outside of the ASF SVN repository and
> our public mailing lists must be processed like this, even if the
> external developer is already an ASF committer.

I suppose that submitting it as a PR does follow some of that process,
but there is maybe less assurance of ownership. Because it was not
developed in an ASF repository, that code is presumed to be owned by
you, multiple developers, or a company, and so that ownership must be
granted to ASF via the IP clearance process, with appropriate software
grant, CLA's, etc. (At least, that's my admittedly limited understanding
of the process).

- Steve


On 9/9/21 3:34 PM, John Wass wrote:
> Couldn't we (the vscode contributors) submit a series of PRs against the
> new repo to move the code, and just archive the example repo as-is?
>
> I noted some thoughts on that a while back
> https://github.com/jw3/example-daffodil-vscode/issues/77
>
>
>
> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike 
> wrote:
>
>> I know of one file in the repo which will have to be removed which is the
>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>
>> The copyright and provisions of that are not compatible with Apache
>> licensing.
>>
>> We can find a DFDL schema that we created that has Apache license to use
>> instead.
>>
>> For the other files under src, server, and build, can we generate a list
>> of files identifying which are:
>>
>> (a) original MIT-licensed, unmodified
>> (b) new - can be ASL
>> (c) blended - started from MIT-licensed source, modified with
>> daffodil-vscode-specific changes.
>>
>> It is these blended files that are the problematic ones.
>>
>>
>>
>> 
>> From: Steve Lawrence 
>> Sent: Thursday, September 9, 2021 1:38 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Correct. For more information about Apache license compatibility:
>>
>>   https://www.apache.org/legal/resolved.html
>>
>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>> generally only in its binary form. So these top-level dependencies look
>> okay, assuming their transitive dependencies are also okay.
>>
>> We'll also need to verify the licenses of all code in the repo.
>

Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-09 Thread Beckerle, Mike
Whether it's a PR or series of PRs, or a software grant, that still doesn't 
resolve the issue of the blended files which are part MIT-licensed original 
code, and part new code deltas by the daffodil-vscode contributors.

We need to understand whether those blended files can be teased apart somehow 
so that it is clear going forward what is an MIT-licensed library and what is 
Apache Licensed.

I just did a grep -R -i microsoft  in a clone of the openwhisk-vscode-extension 
and got zero hits. So no files still carry microsoft copyright and in fact 
their NOTICES.txt file does not indicate any dependency on MIT-licensed code at 
all.  So I think openwhisk-vscode-extension is not going to help us figure out 
how to surf this issue.



From: Steve Lawrence 
Sent: Thursday, September 9, 2021 3:54 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

The concern is that this code was developed outside of Apache and so
didn't follow standard Apache process. From the IP clearance page:

https://incubator.apache.org/ip-clearance/

> Any code that was developed outside of the ASF SVN repository and
> our public mailing lists must be processed like this, even if the
> external developer is already an ASF committer.

I suppose that submitting it as a PR does follow some of that process,
but there is maybe less assurance of ownership. Because it was not
developed in an ASF repository, that code is presumed to be owned by
you, multiple developers, or a company, and so that ownership must be
granted to ASF via the IP clearance process, with appropriate software
grant, CLA's, etc. (At least, that's my admittedly limited understanding
of the process).

- Steve


On 9/9/21 3:34 PM, John Wass wrote:
> Couldn't we (the vscode contributors) submit a series of PRs against the
> new repo to move the code, and just archive the example repo as-is?
>
> I noted some thoughts on that a while back
> https://github.com/jw3/example-daffodil-vscode/issues/77
>
>
>
> On Thu, Sep 9, 2021 at 2:11 PM Beckerle, Mike 
> wrote:
>
>> I know of one file in the repo which will have to be removed which is the
>> jpeg.dfdl.xsd file, which is there just as an example workspace.
>>
>> The copyright and provisions of that are not compatible with Apache
>> licensing.
>>
>> We can find a DFDL schema that we created that has Apache license to use
>> instead.
>>
>> For the other files under src, server, and build, can we generate a list
>> of files identifying which are:
>>
>> (a) original MIT-licensed, unmodified
>> (b) new - can be ASL
>> (c) blended - started from MIT-licensed source, modified with
>> daffodil-vscode-specific changes.
>>
>> It is these blended files that are the problematic ones.
>>
>>
>>
>> 
>> From: Steve Lawrence 
>> Sent: Thursday, September 9, 2021 1:38 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>
>> Correct. For more information about Apache license compatibility:
>>
>>   https://www.apache.org/legal/resolved.html
>>
>> MIT is Category A and is fine. EPL is Category B and is also okay, but
>> generally only in its binary form. So these top-level dependencies look
>> okay, assuming their transitive dependencies are also okay.
>>
>> We'll also need to verify the licenses of all code in the repo.
>> Hopefully little of that is original microsoft MIT and can be granted to
>> ASF and relicensed.
>>
>>
>> On 9/9/21 1:30 PM, Beckerle, Mike wrote:
>>> The requirement, is that the entire dependency tree (transitively)
>> cannot depend on any software that has an Apache-incompatible (aka
>> restrictive) license.
>>>
>>> So we need the transitive closure of all dependencies.
>>>
>>>
>>> 
>>> From: Adam Rosien 
>>> Sent: Thursday, September 9, 2021 12:44 PM
>>> To: dev@daffodil.apache.org 
>>> Subject: Re: daffodil-vscode - how to package and identify the
>> contribution - some git questions
>>>
>>> (I don't understand the requirements of licencing + transitive
>>> dependencies, so I'm giving some surface level license info)
>>>
>>> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
>>> http://logback.qos.ch/license.html
>>> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL
>> 1.0
>>> "co.fs2" %% "fs2-io&q

Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-09 Thread Beckerle, Mike
I know of one file in the repo which will have to be removed which is the 
jpeg.dfdl.xsd file, which is there just as an example workspace.

The copyright and provisions of that are not compatible with Apache licensing.

We can find a DFDL schema that we created that has Apache license to use 
instead.

For the other files under src, server, and build, can we generate a list of 
files identifying which are:

(a) original MIT-licensed, unmodified
(b) new - can be ASL
(c) blended - started from MIT-licensed source, modified with 
daffodil-vscode-specific changes.

It is these blended files that are the problematic ones.




From: Steve Lawrence 
Sent: Thursday, September 9, 2021 1:38 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

Correct. For more information about Apache license compatibility:

  https://www.apache.org/legal/resolved.html

MIT is Category A and is fine. EPL is Category B and is also okay, but
generally only in its binary form. So these top-level dependencies look
okay, assuming their transitive dependencies are also okay.

We'll also need to verify the licenses of all code in the repo.
Hopefully little of that is original microsoft MIT and can be granted to
ASF and relicensed.


On 9/9/21 1:30 PM, Beckerle, Mike wrote:
> The requirement, is that the entire dependency tree (transitively) cannot 
> depend on any software that has an Apache-incompatible (aka restrictive) 
> license.
>
> So we need the transitive closure of all dependencies.
>
>
> 
> From: Adam Rosien 
> Sent: Thursday, September 9, 2021 12:44 PM
> To: dev@daffodil.apache.org 
> Subject: Re: daffodil-vscode - how to package and identify the contribution - 
> some git questions
>
> (I don't understand the requirements of licencing + transitive
> dependencies, so I'm giving some surface level license info)
>
> "ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
> http://logback.qos.ch/license.html
> "com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
> "co.fs2" %% "fs2-io" % "3.0.4" - MIT
> "com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
> "org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0
>
> On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien  wrote:
>
>> I can relay the list of dependencies and their licenses.
>>
>> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence 
>> wrote:
>>
>>> I personally don't care too much about having the existing git history
>>> once its part of ASF, especially if it makes thing any easier (as you
>>> mention, squash/rebase can be difficut through merges). So I'd say we
>>> just do plan B--create a tarball of the current state (without the git
>>> history), and the content of that tarball is what goes through the IP
>>> clearance process, and is the content of the inital commit when adding
>>> to the apache/daffodil-vscode repo.
>>>
>>> Note that I think the incubator will still want access to the existing
>>> repo so they can view the full git history. Understanding where
>>> everything came from and verifying the provenance is important to
>>> ensuring we have all the appropriate CLA's. So while the tarball is
>>> maybe what is officially voted on, they will want access to the repo.
>>>
>>> That said, I don't think we are going to get CLA's for any Microsoft
>>> contribute code. So either all Microsoft contributed code will need to
>>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>>> grant something to ASF where the original codebase stays MIT and isn't
>>> part of that grant.
>>>
>>> I think understanding how much code still exists that is Microsoft/MIT
>>> is going to be important to getting this through the IP clearance process.
>>>
>>> So I'm curious how much of that original Microsoft code still exists? I
>>> assume since it was just example code it has mostly been replaced? If
>>> that's the case, we could potentially say Microsoft has no ownership of
>>> this code, and so their CLA and MIT license aren't necessary?
>>>
>>> We should also have a good understanding of the dependencies. If any of
>>> them are not compatible with ALv2, then going through this process isn't
>>> even worth it until they are replaced. Do you have a list of the
>>> dependencies?
>>>
>>>
>>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>>>> So the daff

Re: daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-09 Thread Beckerle, Mike
The requirement, is that the entire dependency tree (transitively) cannot 
depend on any software that has an Apache-incompatible (aka restrictive) 
license.

So we need the transitive closure of all dependencies.



From: Adam Rosien 
Sent: Thursday, September 9, 2021 12:44 PM
To: dev@daffodil.apache.org 
Subject: Re: daffodil-vscode - how to package and identify the contribution - 
some git questions

(I don't understand the requirements of licencing + transitive
dependencies, so I'm giving some surface level license info)

"ch.qos.logback" % "logback-classic" % "1.2.3" - EPL
http://logback.qos.ch/license.html
"com.microsoft.java" % "com.microsoft.java.debug.core" % "0.31.1" - EPL 1.0
"co.fs2" %% "fs2-io" % "3.0.4" - MIT
"com.monovore" %% "decline-effect" % "2.1.0" - APL 2.0
"org.typelevel" %% "log4cats-slf4j" % "2.1.0" - APL 2.0

On Thu, Sep 9, 2021 at 9:35 AM Adam Rosien  wrote:

> I can relay the list of dependencies and their licenses.
>
> On Thu, Sep 9, 2021 at 9:20 AM Steve Lawrence 
> wrote:
>
>> I personally don't care too much about having the existing git history
>> once its part of ASF, especially if it makes thing any easier (as you
>> mention, squash/rebase can be difficut through merges). So I'd say we
>> just do plan B--create a tarball of the current state (without the git
>> history), and the content of that tarball is what goes through the IP
>> clearance process, and is the content of the inital commit when adding
>> to the apache/daffodil-vscode repo.
>>
>> Note that I think the incubator will still want access to the existing
>> repo so they can view the full git history. Understanding where
>> everything came from and verifying the provenance is important to
>> ensuring we have all the appropriate CLA's. So while the tarball is
>> maybe what is officially voted on, they will want access to the repo.
>>
>> That said, I don't think we are going to get CLA's for any Microsoft
>> contribute code. So either all Microsoft contributed code will need to
>> be kept MIT, or removed from the codebase. And if feels a bit odd to
>> grant something to ASF where the original codebase stays MIT and isn't
>> part of that grant.
>>
>> I think understanding how much code still exists that is Microsoft/MIT
>> is going to be important to getting this through the IP clearance process.
>>
>> So I'm curious how much of that original Microsoft code still exists? I
>> assume since it was just example code it has mostly been replaced? If
>> that's the case, we could potentially say Microsoft has no ownership of
>> this code, and so their CLA and MIT license aren't necessary?
>>
>> We should also have a good understanding of the dependencies. If any of
>> them are not compatible with ALv2, then going through this process isn't
>> even worth it until they are replaced. Do you have a list of the
>> dependencies?
>>
>>
>> On 9/9/21 11:16 AM, Beckerle, Mike wrote:
>> > So the daffodil-vscode code-base wants to be granted to become part of
>> the
>> > Daffodil project.
>> >
>> > One question arises which is "what is the contribution?" exactly.
>> >
>> > The normal way this is identified is by creating a tarball of the
>> source files
>> > and specifying an sha or md5 hash of that file.
>> >
>> > However, this code base is perhaps different from usual.
>> >
>> > It started by creating a detached fork of the vscode debugger example
>> code base.
>> > This is MIT-Licensed which is a compatible license.
>> >
>> > The files are then edited. There are around 100 commits on top of the
>> base that
>> > came from the vscode debugger repository.
>> >
>> > So the contribution is that set of 100 commits - the
>> patches/change-sets they
>> > represent.
>> >
>> > These commits often edit the original files of the vscode debugger
>> example to
>> > add the daffodil-specific functionality. That is, the contribution
>> material is
>> > in several cases intermingled in the lines of the existing files.
>> That's ok I
>> > think so long as the modified file had MIT license.
>> >
>> > There's some value in preserving the 100 commits by our contributors,
>> not
>> > squashing it down to one commit, though if it's really not sensible to
>> proceed
>> > otherwise, we can choose to squash it down to one commit.
&g

daffodil-vscode - how to package and identify the contribution - some git questions

2021-09-09 Thread Beckerle, Mike
So the daffodil-vscode code-base wants to be granted to become part of the 
Daffodil project.

One question arises which is "what is the contribution?" exactly.

The normal way this is identified is by creating a tarball of the source files 
and specifying an sha or md5 hash of that file.

However, this code base is perhaps different from usual.

It started by creating a detached fork of the vscode debugger example code 
base. This is MIT-Licensed which is a compatible license.

The files are then edited. There are around 100 commits on top of the base that 
came from the vscode debugger repository.

So the contribution is that set of 100 commits - the patches/change-sets they 
represent.

These commits often edit the original files of the vscode debugger example to 
add the daffodil-specific functionality. That is, the contribution material is 
in several cases intermingled in the lines of the existing files.  That's ok I 
think so long as the modified file had MIT license.

There's some value in preserving the 100 commits by our contributors, not 
squashing it down to one commit, though if it's really not sensible to proceed 
otherwise, we can choose to squash it down to one commit.

Furthermore, the vscode debugger example repo itself had many commits in it. 
The current daffodil-vscode repo preserves all these commits as well. I don't 
see value in preserving these commits, and would rather they were squashed into 
a single "starting point" commit, with a dependencies file specifying the 
githash where we forked from, just so we can refer back if necessary.

So as a starting suggestion (subject to discussion of other alternatives) is 
this:

Plan A:

  1.  squash all commits up to and including the last Microsoft commit, 
together into one.
  2.  rebase the remaining commits on top of that.
 *   I'm a bit worried about this rebase. There are merge commits, etc. in 
the history. I'm not sure this will just all rebase while preserving all the 
commits, but maybe it will "just work"
  3.  create a "patch set" corresponding to the 100 or so commits that make up 
the "contribution".
 *   I don't know if this is even feasible for this many commits.
  4.  create a tar/zip of this aggregate patch set.
  5.  compute an md5 of this patch set.

The patch set tar/zip file and its md5 hash are "the granted software".

The problem with this idea is that there's no obvious way to review a patch 
set, shy of applying it.

A better way may be to change steps 3 - 5 above to

Plan B:

3. push the main branch to a new empty git repository
The point of this is to remove all historic stuff from the repository, 
i.e., have a minimal git repo that contains only the contribution and the 
single other commit it must be based on.

4. create a tarball of this git repository, and md5 hash of it

5. document that the contribution is from githash X (after the first commit) to 
githash Y (the final commit) of this repository

This has the advantage that the contribution is a self-contained review-able 
thing.

Other ideas are welcome. (Plans C, D, etc) The only requirements I know of are:

  1.  a single file containing the contribution, and its md5 hash
  2.  a sensible way one can review the contents of this contribution file
  3.  preserve history of derivation from the vscode debugger example.







Mike Beckerle | Principal Engineer

[cid:238f633f-3220-4dc5-944c-ca72b28b8338]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: Please review mock up idea for checksum calculations in DFDL

2021-08-25 Thread Beckerle, Mike
One further comment at the end.


From: Steve Lawrence 
Sent: Monday, August 23, 2021 2:23 PM
To: dev@daffodil.apache.org 
Subject: Re: Please review mock up idea for checksum calculations in DFDL

On 8/23/21 1:51 PM, Beckerle, Mike wrote:
> Comments below see @@@mb
>
> 
> From: Steve Lawrence 
> Sent: Monday, August 9, 2021 12:18 PM
> To: dev@daffodil.apache.org 
> Subject: Re: Please review mock up idea for checksum calculations in DFDL
>
> Some comments:
>
> 1) I like the idea that the layers write to a variable, but it seems
> like the variables are hard coded in the layer transformer? What are
> your thoughts on having the variable defined in a property so that the
> user has more control over the naming/definition of it, maybe via
> something like dfdlx:runtimeProperties? For example:
>
>dfdlx:runtimeProperties="resultVariable=checksumPart1">...
>
> @@@ given that a layer transform can be defined with a unique namespace 
> defined by way of a URI, there's never a need to be
> concerned about naming conflicts. So I think ability to choose the variables 
> names and provide them is overkill.

This is maybe a bit contrived, but one benefit of some configurability
is that if you have a format with two of the same checksums for
different parts of the data, you don't need newVariableInstance stuff.
For example:

  
  

  

  


  

  

So it's just a bit cleaner looking. Though, I'm not sure that's a strong
argument for configuring the variables. I imagine in most formats where
there's multiple of the same checksums then it's in an array and you'd
need new variable instance since the number of checksums isn't known.

I think this is a "let's see" kind of issue. We can use hardwired variables for 
now, and add a feature later to pass in QNames of variables for the layer to 
use if we find it too clumsy.

...



Re: trying to rerun checks on PR - no option for it?

2021-08-24 Thread Beckerle, Mike

>> Workflows will not run on pull_request activity if the pull request
> has a merge conflict. The merge conflict must be resolved first.

> I think users do just need to rebase and fix the conflicts.

But that means the comment history of the PR will be lost, as one will have to 
force-push to the PR branch.

I am thinking we could fix our workflow to address this. I am not sure it is 
worth it though.

When creating a bug branch, also create a "point of departure" branch for the 
master branch point of departure.
E.g, to fix bug  create daf--master from master, and then daf--bug 
from daf--master.

PRs are created for merging back to this departure branch (daf--master) and 
so will never have conflicts. So CI tests will always run.

Feedback on the changes would then be independent of any conflicts with ongoing 
things merging to the master branch. Once all changes are complete, the changes 
can be squashed and merged back to the point-of-departure branch 
(daf--master), and that branch can subsequently be rebased onto the master 
branch, which would involve fixing all conflicts.

This is yet another step, and is only needed if a PR will be open and 
outstanding long enough to require it, but that's something one can't 
necessarily anticipate will be needed or not, so you'd have to just always work 
this way.

I may try doing my next changes in this model.




Re: Please review mock up idea for checksum calculations in DFDL

2021-08-23 Thread Beckerle, Mike
Comments below see @@@mb


From: Steve Lawrence 
Sent: Monday, August 9, 2021 12:18 PM
To: dev@daffodil.apache.org 
Subject: Re: Please review mock up idea for checksum calculations in DFDL

Some comments:

1) I like the idea that the layers write to a variable, but it seems
like the variables are hard coded in the layer transformer? What are
your thoughts on having the variable defined in a property so that the
user has more control over the naming/definition of it, maybe via
something like dfdlx:runtimeProperties? For example:

  ...

@@@ given that a layer transform can be defined with a unique namespace defined 
by way of a URI, there's never a need to be
concerned about naming conflicts. So I think ability to choose the variables 
names and provide them is overkill.

I think of the variable definitions as coming from an imported schema that one 
must have to use the layer transform.
Right now we don't have a way of declaring a layer transform when defined 
outside of the daffodil code base in a pluggable fashion, but assume we had 
something like  
which would
also appear in that import file, then accessing and using the layer transform 
and its associated variables would all be obtained from the one import 
statement.

2) For the IPv4 layer, it feels a bit unfortunate to have to split the
CRC into two separate layers, since the CRC algorithm is really just a
checksum over the whole header with just the checksum field treated as
if it were zero. Is it possible to have a property that just specifies
that the Nth byte doesn't contribute? Maybe something like:

  ...

@@@ In the case of the IPv4 checksum, it can just hardcode the fact that it 
skips those specific bytes.  I included the splitting into two separate layers 
just to illustrate that this complexity could be handled. I will look at 
recasting this as just one checksum layer and see how it comes out. I think the 
other example of the GPS data format with parity bit computations, is worth 
looking at as that one is fairly complicated in which bits contribute in what 
ways.

3) As for implementing the checksums, have you put any thought into
making that extensible? For example, I'm wondering if we only have a
single "checksum" layer, and then the dfdlx:runtimeProperties determines
which algorithm to use? E.g.

  ...

  ...

And then people can register different checksum algorithms without
having to reimplement their own layer? Or maybe we keep it simple and
the default checksum layer just supports a handful of the most common
checksums (maybe those supported by some preexisting checksum library?)

People could still implement their own pluggable checksum layer if they
need something we don't support, but this would cover the most common
cases and avoids a proliferation of a bunch of different layers that are
basically the same except for some minor algorithm details.

@@@ This refactoring can of course be done. But isn't needed to get started. 
Parameters to transform algorithms can be passed in variables, or could be 
specified using an extensible property bag such as dfdlx:runtimeProperties as 
you have shown. We may want a dedicated dfdl:layerParameters property since we 
have other layering-specific properties (e.g., for layering length kind, etc.) 
rather than using a generic hook. Ideally layering transformers could check 
these properties statically and issue SDEs if misused.


On 7/30/21 2:29 PM, Beckerle, Mike wrote:
> I would like comments on the layering enhancement to enable checksum
> computations in DFDL schemas.
>
>
> This is a high-priority feature for Daffodil's next release 3.2.0, especially
> for cybersecurity applications of Daffodil, which I know a number of us are
> involved in.
>
>
> I've produced a mock-up of how it would look, with lots of annotations in a 
> WIP
> pull request on the ethernetIP DFDL schema. I only did the mock-up for the 
> IPV4
> element, so look at that element in the ethernetIP.dfdl.xsd.
>
> (UDP and TCP packets have their own additional checksums - I didn't mock up
> those, just IPV4)
>
>
> This is at https://github.com/DFDLSchemas/ethernetIP/pull/1
> <https://github.com/DFDLSchemas/ethernetIP/pull/1>
>
>
> This doesn't run, it's just an initial mock-up of the ideas for
> checksum/CRC/parity recomputation capability as a further simple extension of
> the existing DFDL layering extension.
>
>
> The layering extension itself is described here:
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc
> <https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc>
>
>
> I did notice that none of the published DFDLSchemas actually use the layering
> transforms that we've built into Daffodil. There

Please review: DFDL parity calculations also - was: Fw: Please review mock up idea for checksum calculations in DFDL

2021-08-03 Thread Beckerle, Mike
A second example focused on DFDL with parity calculations in a GPS format has 
also been "mocked up"

https://github.com/DFDLSchemas/gps-sps/pull/1

Please review and comment on this pull request also. The GPS spec this is based 
on is in the repository also in the doc directory.

Thank you

____
From: Beckerle, Mike
Sent: Friday, July 30, 2021 2:29 PM
To: dev@daffodil.apache.org 
Subject: Please review mock up idea for checksum calculations in DFDL


I would like comments on the layering enhancement to enable checksum 
computations in DFDL schemas.


This is a high-priority feature for Daffodil's next release 3.2.0, especially 
for cybersecurity applications of Daffodil, which I know a number of us are 
involved in.


I've produced a mock-up of how it would look, with lots of annotations in a WIP 
pull request on the ethernetIP DFDL schema. I only did the mock-up for the IPV4 
element, so look at that element in the ethernetIP.dfdl.xsd.

(UDP and TCP packets have their own additional checksums - I didn't mock up 
those, just IPV4)


This is at https://github.com/DFDLSchemas/ethernetIP/pull/1


This doesn't run, it's just an initial mock-up of the ideas for 
checksum/CRC/parity recomputation capability as a further simple extension of 
the existing DFDL layering extension.


The layering extension itself is described here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc


I did notice that none of the published DFDLSchemas actually use the layering 
transforms that we've built into Daffodil. There are some non-public DFDL 
schemas that do use this extension to do line-folding transformations.


There are, however, tests showing the DFDL layering extension in daffodil's 
code base. See

https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
and search for dfdlx:layerTransform property.


The mock-up effectively proposes allowing layer transforms to read and write 
DFDL variables, as a means of them accepting input parameters, and as the means 
of them computing and returning output results.


I plan to do a couple other mock-ups of a check-digit calculation, and some 
parity bit computations, but this IPV4 is enough to get the gist of the idea.


I'd appreciate feedback on this, which you can do on the pull request in the 
usual github code review manner.


-mikeb



Mike Beckerle | Principal Engineer

[cid:2b10f593-ca11-4030-8f7b-3db1a1024055]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>

P +1-781-330-0412



Please review mock up idea for checksum calculations in DFDL

2021-07-30 Thread Beckerle, Mike
I would like comments on the layering enhancement to enable checksum 
computations in DFDL schemas.


This is a high-priority feature for Daffodil's next release 3.2.0, especially 
for cybersecurity applications of Daffodil, which I know a number of us are 
involved in.


I've produced a mock-up of how it would look, with lots of annotations in a WIP 
pull request on the ethernetIP DFDL schema. I only did the mock-up for the IPV4 
element, so look at that element in the ethernetIP.dfdl.xsd.

(UDP and TCP packets have their own additional checksums - I didn't mock up 
those, just IPV4)


This is at https://github.com/DFDLSchemas/ethernetIP/pull/1


This doesn't run, it's just an initial mock-up of the ideas for 
checksum/CRC/parity recomputation capability as a further simple extension of 
the existing DFDL layering extension.


The layering extension itself is described here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Data+Layering+for+Base64%2C+Line-Folding%2C+Compression%2C+Etc


I did notice that none of the published DFDLSchemas actually use the layering 
transforms that we've built into Daffodil. There are some non-public DFDL 
schemas that do use this extension to do line-folding transformations.


There are, however, tests showing the DFDL layering extension in daffodil's 
code base. See

https://github.com/apache/daffodil/blob/master/daffodil-test/src/test/resources/org/apache/daffodil/layers/layers.tdml
and search for dfdlx:layerTransform property.


The mock-up effectively proposes allowing layer transforms to read and write 
DFDL variables, as a means of them accepting input parameters, and as the means 
of them computing and returning output results.


I plan to do a couple other mock-ups of a check-digit calculation, and some 
parity bit computations, but this IPV4 is enough to get the gist of the idea.


I'd appreciate feedback on this, which you can do on the pull request in the 
usual github code review manner.


-mikeb



Mike Beckerle | Principal Engineer

[cid:2b10f593-ca11-4030-8f7b-3db1a1024055]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: trying to rerun checks on PR - no option for it?

2021-07-30 Thread Beckerle, Mike
Ok, so to me this restriction is just plain wrong.

A project that has multiple branches moving forward can't even work this way at 
all.

The tests to run are well defined for this change set.

What if we decided to never merge it back to master, but instead start carrying 
forward a dev branch separate from master?

I don't want any contributor to have to rebase onto latest master until we see 
that their stuff works, passes the checks, etc. Rebasing on top of master will 
lose the history of comments on the PR. (all the commits change). That's also a 
"bug" not a feature, but to live with that we have to simply not rebase until 
very late in the game, and the current policy makes the automated CI testing 
only workable if you rebase onto master frequently, which we know is flawed.

Am I incorrect here?






From: Interrante, John A (GE Research, US) 
Sent: Friday, July 30, 2021 10:59 AM
To: dev@daffodil.apache.org 
Subject: RE: trying to rerun checks on PR - no option for it?

CI won't run when the PR has conflicts with the main branch.  That's why the 
button isn't there.  Darryl needs to rebase his PR, fix the conflict, and push 
his PR again with " git push --force-with-lease".

From: Beckerle, Mike 
Sent: Friday, July 30, 2021 10:51 AM
To: dev@daffodil.apache.org
Subject: EXT: trying to rerun checks on PR - no option for it?

Darryl S. pushed a commit to his PR 
https://github.com/apache/daffodil/pull/601/checks

As a first time contributor, his checks won't automatically run.

I was going to trigger them manually, but I see no option for doing so.

Wasn't there a button for that? In the past I swear I saw one.

Anybody understand what's up with this before I open an INFRA ticket?


Mike Beckerle | Principal Engineer

[cid:9a6b4607-36da-4f6f-9218-104142fd8991]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>
P +1-781-330-0412



trying to rerun checks on PR - no option for it?

2021-07-30 Thread Beckerle, Mike
Darryl S. pushed a commit to his PR 
https://github.com/apache/daffodil/pull/601/checks

As a first time contributor, his checks won't automatically run.

I was going to trigger them manually, but I see no option for doing so.

Wasn't there a button for that? In the past I swear I saw one.

Anybody understand what's up with this before I open an INFRA ticket?


Mike Beckerle | Principal Engineer

[cid:9a6b4607-36da-4f6f-9218-104142fd8991]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: How to list offset and length of DFDL elements within native data?

2021-07-01 Thread Beckerle, Mike
Daffodil doesn't currently have this ability.

The raw ingredients are largely there.

For example, the dfdl:valueLength or dfdl:contentLength function can be used as 
rulers to measure how big something is.

So if you organized a DFDL schema as




Then you can put an element in the schema and literally ask for 
dfdl:valueLength(../measureThis) in a dfdl:outputValueCalc element.

The idea that we should be able to annotate every element with its start 
position and length, and carry this through as annotated Infoset output is a 
good one. The debugger hooks have this information and output it in the trace 
output.



From: Interrante, John A (GE Research, US) 
Sent: Thursday, July 1, 2021 10:05 AM
To: dev@daffodil.apache.org 
Subject: How to list offset and length of DFDL elements within native data?

I've been asked a Daffodil / DFDL question that I don't know how to answer.  
The question is:

How to implement a function like get_offset_len(data, schema, 
field_path) -> (offset, length) ?

Do you know a good way (using Daffodil library functions or 
DFDL constructs) to pass some native data, a DFDL schema, an XPath or DPath 
expression referring to an element in the DFDL schema, and get the offset and 
length of that element's field within the native data?

Alternatively, does Daffodil have a way to apply a DFDL schema 
to some native data, construct an infoset from the native data, and list all 
the elements in the infoset along with their DPath, offset, and length?

I searched the Daffodil codebase and wasn't able to find a specific API like 
that although I may have missed something usable.  I scanned the DFDL 
specification and I did find a DFDL function called "dfdl:contentLength" in 
section 18.5.3.  The function's signature is:

dfdl:contentLength($node, $lengthUnits)

Returns the length of the supplied node's SimpleContent region 
for elements of simple type, or ComplexContent region for elements of complex 
type. These regions are defined in Section 9.2 DFDL Data Syntax Grammar. The 
value is returned as an xs:unsignedLong.
The second argument is of type xs:string and must be 'bytes', 'characters', or 
'bits' (Schema Definition Error otherwise) and determines the units of length.

Being able to get each element's length looks like it could help although a 
note in the same section said that the content length returned by 
dfdl:contentLength() excludes any alignment filling as well as any leading or 
trailing skip bytes.   That is, the returned length tells you about the length 
of the content, but does not tell you about the position of the content in the 
native data stream which is what I was asked to find.  Nevertheless, if the 
native data is not text but rather binary data with fixed-size fields, being 
able to list each content field with its length might be sufficient to deduce 
the position of each content field as well.

I wonder which would be easier to do?


  1.  Write a Scala program which calls some Daffodil API to parse some native 
data, construct an infoset from the native data, and list all the elements in 
the infoset along with their DPath, offset, and length?  This would require 
Daffodil to have an API to iterate over each element in the infoset and return 
each element's content length.
  2.  Add DFDL constructs to a DFDL schema which call dfdl:contentLength and 
dfdl:outputValueCalc to append the same information to the infoset?  This would 
require saving the infoset as XML and writing a program or command to read the 
information as a list.
  3.  Another way which I don't know about yet?
  4.  How would we handle any alignment filling as well as any leading or 
trailing skip bytes if the DFDL schema uses them?

Thanks,
John


Java 9's Modules System - not a help for dependency isolation problem

2021-06-29 Thread Beckerle, Mike
We have many *many* dependencies from Daffodil, and this has the potential to 
cause conflicts when creating systems that use Daffodil as part of a larger 
system.

If an application requires libraries A and B (suppose B is daffodil), and those 
each in turn require library C (suppose C is the ICU libraries), but A and B 
each depend on different incompatible versions of C, then you have an 
unresolvable conflict.

This is called the "dependency isolation" problem. One would like to be able to 
link libraries A and B with their own respective isolated versions of library 
C, with no interactions.

In the past I had examined the OGSI modular components system (used by Eclipse) 
and rejected it for excess complexity.
OGSI solves the dependency isolation problem, but it also introduces a lot of 
additional long-running software system lifecycle mechanism that is quite 
complex, and not well motivated for a library like Daffodil.

I had hoped that the Java 9 modules stuff would be a rethink on all of this, 
and that it would be focused on solving only selective linking (leaving out 
what you don't use), and dependency isolation.

What I have learned is that Java 9 modules does half of what I wanted.

It successfully allows selective linking and has been used to modularize the 
gigantic Java runtime to enable smaller footprint in memory. That much is good.

But the Java 9 modules system does *not* solve the dependency isolation 
problem. There can still only be one copy of each library version for all 
dependencies of all modules.

I would include references, but a web search for Java 9 Modules OGSI gets you 
many hits and you can find the discussions easily.

Tools like scala-steward don't eliminate the need for dependency isolation, but 
they do minimize it, especially for open-source software.

People who are building systems combining large libraries like Daffodil with 
commercial slow-changing software libraries having dependencies on older 
versions of many other things, those are the people who really run into the 
need for a dependency isolation solution.

For the time being they will have to continue to rely on running incompatible 
things in separate processes or containers or using OGSI if they want 
everything in the same process/JVM.

Mike Beckerle | Principal Engineer

[cid:61c182a7-6d2e-4f39-98e0-cb25ac62333c]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: So many automated PRs... do we need to count them separately?

2021-06-24 Thread Beckerle, Mike
This link you sent:

https://github.com/apache/daffodil/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-desc

is really useful. Like I noticed all my WIP PRs are down at the bottom of the 
list 




So many automated PRs... do we need to count them separately?

2021-06-24 Thread Beckerle, Mike
One of the metrics of project health at the ASF is number of PRs and commits on 
projects.

Ours have been massively inflated by these scalabot and dependbot PRs. 
(dependbot is new, but I was already observing this just from the scala update 
bot).

This isn't terrible, and presumably as other projects adopt this improvement to 
the SDLC all the numbers will adjust upward, with expectations adjusting upward 
similarly.

I just wanted everyone to understand that "non-automated PRs" is perhaps a 
future metric of note, and that we're seeing an inflated number of PRs and 
commits and emails now due to these bots.

I believe that these improve the quality of our software, and reduce 
maintenance burdens on the team, so I hope more projects adopt this.

I just wanted everyone to understand that there are some odd ASF "community 
health" metric implications of this that I will raise in the next Apache 
Daffodil Board report, just to advise them that our project (like others) is 
experiencing this big flurry and new steady state, of automated PRs.

I doubt this is news, but it's worth mentioning given that we're a small new 
project and the instant growth due to these bots is a one-time transient not 
reflective of (and in fact overwhelming) our organic community growth, which is 
non-zero, but slow. (I'm ok with it, thank you new contributors!)

I am interested in people's thoughts about this notion of counting automated 
PRs separately from human-originated PRs.

I am also interested in whether people find this flurry of constant bot 
activity disruptive. I admit I find it so. I am going to need to create email 
rules to segregate this email traffic into folders so they're not in my daily 
view, and I wonder if we need to have an informal policy that people aren't 
expected to respond/review these except but once a week/month or some such.

Thoughts welcome.

-mikeb



Mike Beckerle | Principal Engineer

[cid:5d17796e-a90b-49ae-aede-06b717ee9a7a]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: test cases - examples

2021-06-17 Thread Beckerle, Mike

Depends on what you mean by available.

They're in the daffodil source tree.

The daffodil-test module is a big suite of tests, organized by section numbers 
of the DFDL spec,
all expressed as TDML.

https://github.com/apache/daffodil/tree/master/daffodil-test/src/test/resources/org/apache/daffodil

Unfortunately, the section numbers evolved and no longer match the final DFDL 
spec., but most are the same.




From: Attila Horvath 
Sent: Thursday, June 17, 2021 1:30 PM
To: dev@daffodil.apache.org 
Subject: test cases - examples

ALCON

I assume there is a repository of test cases with which to validate and
regression test various corresponding daffodil features.

Are these test cases available to users for reference as examples?

Thx

Attila


transformation example

2021-06-17 Thread Beckerle, Mike
DFDL isn't normally thought of as a data transformation language.

Yet it has computed elements, hidden groups, expressions that can refer to 
elements in arrays using the index position within the current array (via the 
dfdl:occursIndex() function).

These result in it having substantial data transformation capabilities.

For a while I have said that I bet one can invert a matrix in DFDL.

So I finally created an example that does so.  It takes a representation of 
data as a pair of lists, and creates a logical infoset that is a list of pairs.

https://github.com/OpenDFDL/examples/tree/master/pairsTransform

The conclusion of this little experiment is that while this is possible for 
parsing, you can't invert the process perfectly in unparsing due to some DFDL 
v1.0 restrictions. I may do experiments in Daffodil to lift those restrictions.

The notion of an entirely schema-based transform language is very interesting 
given that typical XML transformation languages such as XSLT and XQuery are 
both template/instance-document based, not schema based.




Mike Beckerle | Principal Engineer

[cid:eb186e2a-a515-4148-a5af-3d5f693737bd]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: Use GitHub Releases

2021-06-09 Thread Beckerle, Mike
I think it is fine to have github releases and convenience binaries served from 
there, with a couple constraints based on not undermining the important ASF 
policies that provide for verifiable software supply chain.

If the github releases and artifacts correspond to official Apache releases, 
then:

1) they have to be identical bit-for-bit to those provided from ASF and maven 
central.

2) both we and our users have to be able to readily verify that this is the 
case (same file names, same hashes, easy to find links to the official ASF 
locations that store the hashes, have the signer keys to verify against, etc.)

If these github-based "releases" are intermediate/snapshot kinds of things, 
then I think the only requirement is that it's clear that's what they are, 
(distinct file names, etc. ) so they're not able to be confused with any 
official release.

I think experimentation to see what works well for the debugger/IDE is very 
sensible.


From: John Wass 
Sent: Wednesday, June 9, 2021 2:35 PM
To: dev@daffodil.apache.org 
Subject: Re: Use GitHub Releases

> GitHub does automatically create "Releases when we create a new tag.

The UI rolls them together, but they are two separate things in the API.
Daffodil has no releases according to the API.

https://api.github.com/repos/apache/daffodil/tags
https://api.github.com/repos/apache/daffodil/releases
https://docs.github.com/en/rest/reference/repos#list-releases


> Is there some API that's not available unless we manually create releases?

We can't attach assets to a tag, only a release.


> Are you looking to have convenience binaries also published to these
release?

Yes, asset fetching along with version lookup was the point of the post, I
should have mentioned that ;)

Do all Daffodil artifacts need to be published, no, there is Maven Central
for the jars, but what about publishing the applications as assets, that
would be the CLI and in the future a debugger backend.


> What kinds of information are you looking to query from the releases?

At first the available releases and their assets, but there is additional
metadata in a release object that might be interesting at some point.


> That has some basic version and release date information. And as I
mentioned before, it requires that projects keep it up to date.

The GitHub Release API does provide a nice single entrypoint for query and
fetch of assets (and metadata for future use).  Looking at these Apache
references, it doesn't appear to be as robust.




On Wed, Jun 9, 2021 at 12:54 PM Steve Lawrence  wrote:

> GitHub does automatically create "Releases when we create a new tag.
>
>   https://github.com/apache/daffodil/releases
>
> Is there some API that's not available unless we manually create
> releases? Are you looking to have convenience binaries also published to
> these release?
>
> What kinds of information are you looking to query from the releases?
>
> I know some projects (including Daffodil) keep an updated "Description
> Of A Project" (doap) file, which is parsed by Apache to fill out project
> information that can be queried here:
>
>   https://projects.apache.org/project.html
>
> This is our doap file:
>
>   https://daffodil.apache.org/doap.rdf
>
> And this is the project page that is generated from that file:
>
>   https://projects.apache.org/project.html?daffodil
>
> That has some basic version and release date information. And as I
> mentioned before, it requires that projects keep it up to date. I'm not
> sure how many do if you're interested about other projects.
>
>
> On 6/9/21 12:36 PM, John Wass wrote:
> >> the simplest is to ask
> >
> > Well the simplest for __me__ is to ask, this will add some overhead to
> the
> > release process for someone.  It looks like some Apache projects do
> GitHub
> > releases, most don't.
> >
> > Also looking for an Apache API to query releases and their artifacts.
> >
> >
> > On Wed, Jun 9, 2021 at 12:13 PM John Wass  wrote:
> >
> >> We have been using the GitHub API to collect (representative) releases
> of
> >> Daffodil during some prototype work.  However when looking at the main
> >> Daffodil repo I see there are no releases published there.
> >>
> >> There are probably some other ways to work around this, but the simplest
> >> is to ask if publishing releases to GitHub is something that can be done
> >> going forward?
> >>
> >>
> >
>
>


VSCode License - we're good

2021-06-09 Thread Beckerle, Mike
So I think our debugger work won't involve direct linking to VSCode, but anyway 
I checked on the licenses.

The source code license for VSCode is the MIT License, which is Category A.  So 
if we found we need to, for example, embed a captive custom version of VSCode, 
we could embed this source. I don't forsee any need for that, but it's nice to 
know it's possible.

The binary license for VSCode is the standard Microsoft Software License, which 
makes it off limits for inclusion. It is not an open-source license.  But...

But, given that VSCode is OSS, there is https://vscodium.com/ which is the pure 
OSS version of VSCode under the MIT License. This also doesn't have the 
microsoft-specific telemetry and branding that regular VSCode binary has.

So we're fine with building things for VSCode/Codium as we have ample freedoms 
here.


Mike Beckerle | Principal Engineer

[cid:96d5a7af-cc2e-4ba0-9467-b7b012b6ef79]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



XML Catalogs feature - should we deprecate it

2021-06-04 Thread Beckerle, Mike
I spent quite a bit of time this week trying to get 2 DFDL schemas using XML 
Catalogs to work.

I succeeded, but I find XML Catalogs very fragile, with important aspects that 
don't work (I was unable to get relative-catalogs to work at all) and I am 
wondering if we really need to support XML Catalogs not. Our classpath based 
resolution of schema locations works quite well and can be used to make quite 
modular DFDL schemas.

Currently, XML schemas built into Daffodil (e.g., tdml.xsd, the schema for DFDL 
schemas, etc.) are resolved from a built-in XML catalog, but really the fact 
that this is using a catalog is not particularly important. You can think of it 
as they're just built into our resolver.

XML Schemas referenced using the schemaLocation attribute on an 
xs:import/xs:include or the xsi:schemaLocation on XML instance documents are 
resolved by either:

1) relative to the file containing the reference, i.e., the file containing the 
include/import statement

2) relative to the root of any directory or jar on the classpath, searching 
them in classpath order. First one found wins.

This enables one to create multi-part DFDL schemas, such as for an envelope 
format, and a payload format carried inside the envelope format. Each can be 
testable separately, and a combining schema can be created that puts the 
payload on the classpath first, followed by the envelope format, and combines 
the two.

Given that this works, and works well, I am wondering if we should just 
deprecate the XML catalog feature.

Thoughts?

Mike Beckerle | Principal Engineer

[cid:000f208e-56f7-4520-ae10-2256125e4e50]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: The future of the daffodil DFDL schema debugger?

2021-05-26 Thread Beckerle, Mike
I think the point was to understand debugging in daffodil, one must understand, 
and potentially have to display, the data structures that the runtime maintains.

Furthermore, some of the actions the parser/unparser takes are universal, like 
invoking a parser. Others require finer detail than that - e.g., delimiter 
scanning certainly needs more detailed treatment from the debugger.

But first approximation is there should be some way to display, inspect, and 
potentially manipulate each piece of state.


From: John Wass 
Sent: Wednesday, May 26, 2021 2:46 PM
To: dev@daffodil.apache.org 
Subject: Re: The future of the daffodil DFDL schema debugger?

> Some thoughts re: data format debugger
> I suggest we enumerate

Mike, are you saying there is some ground work to lay for this in Daffodil
itself, or are these things which the debugger needs to model after
existing concepts.


On Mon, May 24, 2021 at 12:48 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> 
> From: Steve Lawrence 
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org 
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when

Re: The future of the daffodil DFDL schema debugger?

2021-05-24 Thread Beckerle, Mike
 Maybe a solution is structured like this
>>>> - daffodil-debug-api:
>>>>   - protocol model
>>>>   - interfaces: debugger / IO adapter / etc
>>>>   - lives in daffodil repo (new subproject?)
>>>> - daffodil-debug-io-NAME
>>>>   - provides implementation of a specific IO adapter
>>>>   - multiple projects possible (daffodil-debugger-akka,
>>>> daffodil-debugger-zio, etc)
>>>>   - supported ones live in their own subprojects, but other can be
>>>> plugged in from external sources
>>>>   - ability to support multiple implementations reduces risk of lock-in
>>>> - debugger applications
>>>>   - maintained in external repositories
>>>>   - depending on the IO implementation these could execute be in
>> separate
>>>> process or on separate machine
>>>>   - like Steve said, could be any language / framework
>>>>
>>>> Three types of reference implementations / sample applications could
>> also
>>>> guide the development of the API
>>>>   1. a replacement for the existing TUI debugger, expected to end up
>> with
>>>> at minimum the same functionality as the current one.
>>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>>>   3. an IDE integration
>>>>
>>>> Thoughts?
>>>>
>>>> Also I'm working on some reference implementations of these concepts
>>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
>> code
>>>> is here https://github.com/jw3/example-daffodil-debug
>>>>
>>>>
>>>>
>>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence 
>>>> wrote:
>>>>
>>>>> Yep, something like that seems very reasonable for dealing with large
>>>>> infosets. But it still feels like we still run into usability issues.
>>>>> For example, what if a user wants to see more? We need some
>>>>> configuration options to increase what we've ellided. It's not big, but
>>>>> every new thing that needs configuration adds complexity and decreases
>>>>> usability.
>>>>>
>>>>> And I think the only reason we are trying to spend effort elliding
>>>>> things is because we're limited to this gdb-like interface where you
>> can
>>>>> only print out a little information at a time.
>>>>>
>>>>> I think what would really is to dump this gdb interface and instead use
>>>>> multiple windows/views. As a really close example to what I imagine, I
>>>>> recently came across this hex editor:
>>>>>
>>>>> https://www.synalysis.net/
>>>>>
>>>>> The screenshots are a bit small so it's not super clear, but this tool
>>>>> has one view for the data in hex, and one view for a tree of parsed
>>>>> results (which is very similar to our infoset). The "infoset" view has
>>>>> information like offset/length/value, and can be related back to the
>>>>> data view to find the actual bits.
>>>>>
>>>>> I imagine the "next generation daffodil debugger" to look much like
>>>>> this. As data is parsed, the infoset view fills up. This view could act
>>>>> like a standard GUI tree so you could collapse sections or scroll
>> around
>>>>> to show just the parts you care about, and have search capabilities to
>>>>> quickly jump around. The advantage here is you no longer really need
>>>>> automated eliding or heuristics for what the user *might* care about.
>>>>> You just show the whole thing and let user scroll around. As daffodil
>>>>> parses and backtracks, this tree grows or shrinks.
>>>>>
>>>>> I also imagine you could have a cursor moving around the hex view, so
>> as
>>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>>>> integers), one could update this data view to show what daffodil is
>>>>> doing and where it is.
>>>>>
>>>>> I also image there could be other views as well. For example, a schema
>>>>> view to show where in the schema daffodil is, and to add/remove
>>>>> breakpoints. And an information view for things like variables,
>> in-scope
>>>>> delimiters, PoU's, etc.
>>>>>
>>>>> The only reason I mention a debug protco

Re: broke master 3.2.0-SNAPSHOT branch with latest commit

2021-05-20 Thread Beckerle, Mike
Note only broken for Java 8 JVM.

From: Beckerle, Mike 
Sent: Thursday, May 20, 2021 3:08 PM
To: dev@daffodil.apache.org 
Subject: broke master 3.2.0-SNAPSHOT branch with latest commit

Will fix shortly.


Mike Beckerle | Principal Engineer

[cid:233aca18-26a6-4e2f-9ce6-557022deda7e]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>

P +1-781-330-0412



broke master 3.2.0-SNAPSHOT branch with latest commit

2021-05-20 Thread Beckerle, Mike
Will fix shortly.


Mike Beckerle | Principal Engineer

[cid:233aca18-26a6-4e2f-9ce6-557022deda7e]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: DAFFODIL-1927

2021-05-20 Thread Beckerle, Mike
This is an important change. There is design work to do here for how it should 
work.

There are two features in Daffodil that dynamically load extensions already: 
User-defined functions, and Validators.

So you should be able to look at those to figure out how to dynamically load 
layer implementations from the classpath.

One possible approach is to take one of the existing layering transforms, like 
the one for AIS payload armoring, and make that into a dynamically loaded one.

There are already tests for that, so you would be able to tell easily if your 
dynamically loaded version works the same without having to invent a lot of 
tests.

And AIS is an obscure and special-purpose thing that really should​ be outside 
of daffodil as a loadable layer transform.

Arguably, all of the layer transforms should be in external libraries that live 
in separate jars.

See 
daffodil-runtime1/src/main/scala/org/apache/daffodil/layers/AISTransformer.scala

You'll see that these are "registered" in the file LayerTransformer.scala, and 
that mechanism is what must change to get the transformer from an external 
classpath/jar.

I am happy if the layer transformers have to be written in Scala for now. A 
later enhancement can be to make a polished Java API for these extensions. The 
UDF feature allows writing UDFs in either Java or Scala, but I don't think we 
need to do a Java API as yet for this layering stuff, or I'd start by just 
doing scala for now.

If a layer definition moves out to a separate jar, the testing for it also 
needs to be expressed outside as well.
The unit tests this is straightforward to to. But there are TDML-based tests in

daffodil-test/src/test/resources/org/apache/daffodil/layers/ais.tdml
daffodil-test/src/test/scala/org/apache/daffodil/layers/TestAIS.scala

The dynamic loading aspect of the user-defined function (UDF) feature is in

daffodil-runtime1/src/main/scala/org/apache/daffodil/udf/UserDefinedFunctionService.scala

It has TDML tests as well in:

daffodil-test/src/test/scala/org/apache/daffodil/udf/TestUdfsInSchemas.scala

but the UDFs used to test, are themselves defined in

daffodil-udf/src/test/scala/org/sgoodudfs/example/StringFunctions/StringFunctionsProvider.scala

Which is a separate module of daffodil, Note these are for testing, and so live 
under src/test in that module. So they're not delivered as part of Daffodil 
jars.

The primary developer of UDF was Olabusayo Kilo, and the primary developer of 
the dynamically loadable validations was John Wass (jw3)


From: Sandeep Kumar 
Sent: Thursday, May 20, 2021 7:47 AM
To: dev@daffodil.apache.org 
Subject: DAFFODIL-1927

Hi Team,

I'd like to work on the following task :
https://issues.apache.org/jira/browse/DAFFODIL-1927
Can you please help with the initial steps for this?

Regards,
Sandeep


ICU library version - was: IBM DFDL is upgrading ICU level

2021-05-19 Thread Beckerle, Mike
DFDL implementors at IBM have noticed some issues with the ICU library worth 
noting.
There is an ICU pull request to fix, targeted at ICU version 70.1.


From: 
Subject: IBM DFDL is upgrading ICU level

Hi Mike

We are moving up the level of ICU that IBM DFDL is built with, as it is still 
on 51 which is out-of-support. We are trying 68.x.
In the process we found several of our regression tests failed due to behaviour 
changes in lax decimal/calendar processing.
If you recall we deliberately changed the DFDL 1.0 spec to make lax 
implementation-dependent/defined (I forget which).
We've analysed the differences and most are to do with bug fixes or other 
changes that are acceptable or benign or we don't think any of our customer 
will hit.
However, we got 400 failures in our Java version which didn't appear in our C 
version.
This looks to be have been caused by a regression somewhere, we think back in 
62 - see https://unicode-org.atlassian.net/browse/ICU-20425.
ICU have accepted there is a problem and the fix is in PR 
https://github.com/unicode-org/icu/pull/1726 which is targeted at 70.1.
Letting you know as moving to 70.1 and higher might therefore cause a Daffodil 
behaviour change.

Regards

Steve Hanson

IBM Hybrid Integration, Hursley, UK
Architect, IBM 
DFDL
Co-Chair, OGF DFDL Working Group



Re: [GitHub] [daffodil] stevedlawrence commented on pull request #567: Update scala-library, scala-reflect to 2.13.6

2021-05-18 Thread Beckerle, Mike
Is scala reflect 2.13.6 actually about scala 2.13, or is it just a coincidence 
of the numbering that their sub-version number just rolled up to 13?


From: GitBox 
Sent: Tuesday, May 18, 2021 12:55 PM
To: comm...@daffodil.apache.org 
Subject: [GitHub] [daffodil] stevedlawrence commented on pull request #567: 
Update scala-library, scala-reflect to 2.13.6


stevedlawrence commented on pull request #567:
URL: https://github.com/apache/daffodil/pull/567#issuecomment-843359666


   We should just close this PR.

   There's a significant amount of changes needed to get Daffodil compiling on 
2.13.x, and we have an bug already open for this issue: 
[DAFFODIL-2152](https://issues.apache.org/jira/browse/DAFFODIL-2152). Also, PR 
#246 has some initial changes to get 2.13 to work. That PR is likely too old to 
cleanly rebase, but that is a better reference for 2.13 updates than this PR.


--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: Windows debug clues needed

2021-05-18 Thread Beckerle, Mike
So some of these errors seem to be font & charset related.

I installed the japanese language pack, so I have unicode character capability, 
but perhaps I have to set something up to tell it to use unicode and a sensible 
font which has the Kanji characters that are in our tests, for example.



From: Steve Lawrence 
Sent: Tuesday, May 18, 2021 12:35 PM
To: dev@daffodil.apache.org 
Subject: Re: Windows debug clues needed

I'm not sure where it exists on windows, but if you run

  git config --global core.autocrlf true

then it should enable it globally. I think you could also try
reinstallting git for Windows. Last I installed git on Windows (a long
time ago) I think the installer let me pick how to handle line endings.
I'm not sure if it explicitly mentions the autocrlf option though, or if
it even gives that option anymore.


On 5/18/21 12:30 PM, Beckerle, Mike wrote:
> The XML comments was definitely part of it. Found that one and fixed it with 
> the override def comment(..) as you suggest.
>
> There are still other tests failing though.
>
> I have been spinning up a windows dev environment and I noticed that the 
> comments contain CRLFs on windows, and likely due to autoCRLF stuff, do not 
> on Linux.
>
> Where do I find this autoCRLF setting? The .gitconfig in my linux home 
> doesn't have an autocrlf setting. And I am not finding one on Windows, though 
> not sure entirely where that would live on Windows.
>
>
> 
> From: Steve Lawrence 
> Sent: Tuesday, May 18, 2021 11:57 AM
> To: dev@daffodil.apache.org 
> Subject: Re: Windows debug clues needed
>
> Finished the review and I didn't find anything, but I just noticed that
> some of the failed TMDL tests have infosets that include comments, that
> include newlines.
>
> I'm guessing these comments aren't stripped out by the loader and also
> aren't normalized. So when git autocrlf kicks in, it changes these
> comments to have \r\n, we don't normalize that, and the the infoset does
> contain \r's and we error.
>
> It looks like the ConstructingParser has a 'def comment' function, so
> maybe we just need to override that to normalize the comment contents?
>
> - Steve
>
>
> On 5/18/21 8:58 AM, Beckerle, Mike wrote:
>> My PR https://github.com/apache/daffodil/pull/560
>> <https://github.com/apache/daffodil/pull/560>
>>
>> Keeps failing its tests on MS-Windows.
>>
>> I am unable to reproduce the failures on Linux obviously.
>>
>> But... I am also unable to reproduce these failures on MS-Windows.
>>
>> I have installed sbt, git, intellij idea, emacs, etc. all on MS-Windows. 
>> When I
>> run the tests via sbt test... they all pass.
>>
>> So one possibility is that I have git configured differently (w.r.t the 
>> autocrlf
>> stuff) than is done by the windows CI systems.
>>
>> Suggestions?
>>
>>
>>
>>
>> Mike Beckerle | Principal Engineer
>>
>> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>>
>> P +1-781-330-0412
>>
>
>



Re: Windows debug clues needed

2021-05-18 Thread Beckerle, Mike
The XML comments was definitely part of it. Found that one and fixed it with 
the override def comment(..) as you suggest.

There are still other tests failing though.

I have been spinning up a windows dev environment and I noticed that the 
comments contain CRLFs on windows, and likely due to autoCRLF stuff, do not on 
Linux.

Where do I find this autoCRLF setting? The .gitconfig in my linux home doesn't 
have an autocrlf setting. And I am not finding one on Windows, though not sure 
entirely where that would live on Windows.



From: Steve Lawrence 
Sent: Tuesday, May 18, 2021 11:57 AM
To: dev@daffodil.apache.org 
Subject: Re: Windows debug clues needed

Finished the review and I didn't find anything, but I just noticed that
some of the failed TMDL tests have infosets that include comments, that
include newlines.

I'm guessing these comments aren't stripped out by the loader and also
aren't normalized. So when git autocrlf kicks in, it changes these
comments to have \r\n, we don't normalize that, and the the infoset does
contain \r's and we error.

It looks like the ConstructingParser has a 'def comment' function, so
maybe we just need to override that to normalize the comment contents?

- Steve


On 5/18/21 8:58 AM, Beckerle, Mike wrote:
> My PR https://github.com/apache/daffodil/pull/560
> <https://github.com/apache/daffodil/pull/560>
>
> Keeps failing its tests on MS-Windows.
>
> I am unable to reproduce the failures on Linux obviously.
>
> But... I am also unable to reproduce these failures on MS-Windows.
>
> I have installed sbt, git, intellij idea, emacs, etc. all on MS-Windows. When 
> I
> run the tests via sbt test... they all pass.
>
> So one possibility is that I have git configured differently (w.r.t the 
> autocrlf
> stuff) than is done by the windows CI systems.
>
> Suggestions?
>
>
>
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>
> P +1-781-330-0412
>



Windows debug clues needed

2021-05-18 Thread Beckerle, Mike
My PR https://github.com/apache/daffodil/pull/560

Keeps failing its tests on MS-Windows.

I am unable to reproduce the failures on Linux obviously.

But... I am also unable to reproduce these failures on MS-Windows.

I have installed sbt, git, intellij idea, emacs, etc. all on MS-Windows. When I 
run the tests via sbt test... they all pass.

So one possibility is that I have git configured differently (w.r.t the 
autocrlf stuff) than is done by the windows CI systems.

Suggestions?




Mike Beckerle | Principal Engineer

[cid:a2c52957-7bf9-4dbc-b1c3-17e950b58bf8]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: [VOTE] Release Apache Daffodil 3.1.0-rc2

2021-05-17 Thread Beckerle, Mike
+1

I checked:

  *   all developer tests pass (test, it:test)
  *   all ibm compatibility tests pass for all known schemas
 *   includes all the portable ones at github DFDLSchemas, and a few other 
non-public schemas
  *   all known schemas pass daffodil tests - excepting vcard - which is a 
known documented regression
 *   This includes testing the user-defined-functions feature, which is 
used by one non-public DFDL schema
  *   examples like the java API "helloworld" example pass their tests.
  *   Release notes page looks good.
  *   Scaladoc/javadoc looks good.




From: Steve Lawrence 
Sent: Monday, May 17, 2021 10:21 AM
To: dev@daffodil.apache.org 
Subject: Re: [VOTE] Release Apache Daffodil 3.1.0-rc2

+1

I checked:

[OK] hashes and signatures of source and helper binaries are correct
[OK] signature of git tag is correct
[OK] source release matches git tag (minus KEYS file)
[OK] source compiles and all tests pass (both en_US and de_DE LANG)
[OK] jars in helper binaries and the repository are exactly the same
[OK] jars built from source are exactly the same as helper binary jars
[OK] src, binaries, and jars include correct LICENSE/NOTICE
[OK] RAT check passes
[OK] no unexpected binaries in source
[OK] distributed dependencies in helper binaries are same as from maven
[OK] rpm and msi install and run with basic usage
[OK] ~60 public and private DFDL schema projects pass tests
[OK] No issues found in JavaDoc and ScalaDoc


On 5/17/21 9:55 AM, Interrante, John A (GE Research, US) wrote:
> +1
>
> I checked (for most items, using WSL2/Ubuntu 20.04 on my laptop):
>
> [OK] rpm installs correctly on Fedora Workstation 34
>
> [OK] some daffodil CLI commands work (generate c, test on runtime2 examples)
>
> [OK] signature of git tag is correct
>
> [OK] hash and signature of each download is correct
>
> [OK] src, bins, and jars include correct LICENSE/NOTICE/DISCLAIMER
>
> [OK] source release matches git tag (minus KEYS file)
>
> [OK] no unexpected binaries in source
>
> [OK] source compiles and all tests pass
>
> [OK] RAT check passes
>
> [OK] jars built from source have the same content as helper binary jars
>
> John
>
> -Original Message-
> From: Steve Lawrence 
> Sent: Friday, May 14, 2021 3:27 PM
> To: dev@daffodil.apache.org
> Subject: EXT: [VOTE] Release Apache Daffodil 3.1.0-rc2
>
> Hi all,
>
> I'd like to call a vote to release Apache Daffodil 3.1.0-rc2.
>
> All distribution packages, including signatures, digests, etc. can be found 
> at:
>
> https://dist.apache.org/repos/dist/dev/daffodil/3.1.0-rc2/
>
> Staging artifacts can be found at:
>
> https://repository.apache.org/content/repositories/orgapachedaffodil-1023/
>
> This release has been signed with PGP key 36F3494B033AE661, corresponding to 
> slawre...@apache.org, which is included in the KEYS file here:
>
> https://downloads.apache.org/daffodil/KEYS
>
> The release candidate has been tagged in git with v3.1.0-rc2.
>
> For reference, here is a list of all closed JIRAs tagged with 3.1.0:
>
> https://s.apache.org/daffodil-issues-3.1.0
>
> For a summary of the changes in this release, see:
>
> https://daffodil.apache.org/releases/3.1.0/
>
> Please review and vote. The vote will be open for at least 72 hours (Monday, 
> 17 May 2021, 4pm EST).
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>



Re: [VOTE] Release Apache Daffodil 3.1.0-rc1

2021-05-14 Thread Beckerle, Mike
Agree that we should go to an rc2.

We are going to have one IBM compatibility regression in this release, on 
vcard. But there should be no other surprises.




From: Adams, Joshua 
Sent: Friday, May 14, 2021 12:23 PM
To: dev@daffodil.apache.org 
Subject: Re: [VOTE] Release Apache Daffodil 3.1.0-rc1

That seems reasonable to me.  Official releases should clearly demonstrate 
cross compatibility with IBM's implementation.

Josh

From: Steve Lawrence 
Sent: Friday, May 14, 2021 11:58 AM
To: dev@daffodil.apache.org 
Subject: Re: [VOTE] Release Apache Daffodil 3.1.0-rc1

I've found a bug in the TDML runner that causes the IBM DFDL Crosstester
to fail:

https://issues.apache.org/jira/browse/DAFFODIL-2517

This only affects the IBM DFDL Crosstester and not Daffodil, but there
isn't really a great workaround, since there is no way for IBM to report
correct final parse/unparse bit position.

I'll have a pull ready request shortly.

Thoughts on cancelling rc1 and include this fix in an rc2? The bug only
affects the IBM DFDL cross tester, but there really is no good workaround.

On 5/12/21 1:47 PM, Steve Lawrence wrote:
> Hi all,
>
> I'd like to call a vote to release Apache Daffodil 3.1.0-rc1.
>
> All distribution packages, including signatures, digests, etc. can be
> found at:
>
> https://dist.apache.org/repos/dist/dev/daffodil/3.1.0-rc1/
>
> Staging artifacts can be found at:
>
> https://repository.apache.org/content/repositories/orgapachedaffodil-1020/
>
> This release has been signed with PGP key 36F3494B033AE661,
> corresponding to slawre...@apache.org, which is included in the KEYS
> file here:
>
> https://downloads.apache.org/daffodil/KEYS
>
> The release candidate has been tagged in git with v3.1.0-rc1.
>
> For reference, here is a list of all closed JIRAs tagged with 3.1.0:
>
> https://s.apache.org/daffodil-issues-3.1.0
>
> For a summary of the changes in this release, see:
>
> https://daffodil.apache.org/releases/3.1.0/
>
> Please review and vote. The vote will be open for at least 72 hours
> (Saturday, 15 May 2021, 2pm EST).
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>



Re: release 3.1.0 critical bugs still outstanding

2021-05-12 Thread Beckerle, Mike
Hmmm. I think isolation just means not group or world readable.

On Unix/Linux chmod go-rwx.
The equivalent on Windows I'm not sure of.

From: Interrante, John A (GE Research, US) 
Sent: Wednesday, May 12, 2021 12:22 PM
To: dev@daffodil.apache.org 
Subject: release 3.1.0 critical bugs still outstanding

I looked at the instructions for generating an Apache code signing key.  I 
believe I can do everything on my work laptop except for one thing - storing my 
$GPGHOME/.gnupg directory with my code signing key in encrypted and isolated 
storage.  My work laptop's disk is encrypted (per GE security policy) but it 
doesn't offer isolation unless there's a clever way of enforcing isolation?

I would need to plug a separate USB device into my work laptop and it would 
have to be a Yubico YubiKey 5 or a Yubico YubiKey 5C since GE's data guardian 
software doesn't allow most USB devices to work (YubiKey devices are an 
exception).

Do you know of anyone who uses a YukiKey device to generate and/or store their 
GPG code signing key and how to use a YubiKey device in the release signing 
workflow?  I found those articles but they don't say much about using the 
YubiKey in a release signing workflow similar to ours...

https://support.yubico.com/hc/en-us/articles/360016614840-Code-Signing-with-the-YubiKey-on-Windows
https://ocramius.github.io/blog/yubikey-for-ssh-gpg-git-and-local-login/
https://eclipsesource.com/blogs/2016/11/25/yubikey-code-signing-with-a-smart-card/

John

-Original Message-
From: Steve Lawrence 
Sent: Tuesday, May 11, 2021 10:23 AM
To: dev@daffodil.apache.org
Subject: EXT: Re: release 3.1.0 critical bugs still outstanding

Agreed, I think the CLI change is worth getting in for 3.1.0. Shouldn't take 
too long to get it merged.

I'll volunteer to be release manager 3.1.0. Probably for the best since I'm 
probably the most familiar with the process/script, and it's possible something 
could be broken with this being the first non-incubator release.

Though, I would recommend that other PMC/committers go through the process to 
create and publish a gpg key so that when it does come time to do a release, at 
least that part is out of the way.


On 5/11/21 10:10 AM, Interrante, John A (GE Research, US) wrote:
> Yes, we will be in good shape for the 3.1.0 release after we update the CLI 
> help information, which I'd like to complete today.  We've also got at least 
> 4-5 days to add validation.md and anything else we want to the daffodil-site 
> before the earliest possible official release announcement could go out.
>
> I'm taking this Thursday and Friday off which is a consideration in 
> volunteering to be the release manager.  I'd have to generate a signing key, 
> add its public part to the KEYS file, commit it to the Daffodil release 
> distribution SVN repository, send the fingerprint to the Apache key server, 
> build a release candidate, and start a vote before I go out of town on 
> Thursday.  Steve, perhaps I should wait for the following release unless you 
> think I'd be able to do all these steps quickly within a few hours.
>
> John
>
> -Original Message-
> From: Beckerle, Mike 
> Sent: Monday, May 10, 2021 2:43 PM
> To: dev@daffodil.apache.org
> Subject: EXT: Re: release 3.1.0 critical bugs still outstanding
>
> I agree we should do the release.
>
> I am in the thick of debugging DAFFODIL-1422, but there's a bunch of 
> refactoring here, 30 files touched, changes in diagnostic behavior, etc. 
> Perhaps best to put it off until after the 3.1.0 release.
>
>
>
>
> 
> From: Steve Lawrence 
> Sent: Monday, May 10, 2021 1:59 PM
> To: dev@daffodil.apache.org 
> Subject: Re: release 3.1.0 critical bugs still outstanding
>
> We have fixed the later two mentioned issues. The current list of critical 
> issues is now:
>
> DAFFODIL-1422: disallow doctype decls in all XML & XSD that we read in
> DAFFODIL-2400: New SAX API causes performance degredations.
> DAFFODIL-2473: Need bit-wise AND, OR, NOT, and shift operations
>
> I agree the the first two can likely be postponed without issue. The last one 
> doesn't even seem critical to me, unless there are very important formats 
> that require this functions that I'm not aware of. I suggest we also postpone 
> that ticket as well.
>
> If others agree, I think we are ready for the 3.1.0 release?
>
> Does any want to volunteer to be the release manager? I've done it a handful 
> of times so don't mind, but it might be good to get others experience 
> depending on availability. By this point, the workflow is pretty well 
> documented here:
>
> https://cwiki.apache.org/confluence/display/DAFFODIL/Release+Workflow
>
>
> On 5/3/21 5:25 PM, Beckerle, Mike wrote:
>> Of the 4 remaining "critical

sbt testOnly command line for running exactly one integration test?

2021-05-11 Thread Beckerle, Mike
I am wasting a bunch of time because just two integration tests are failing in 
my current work.

I can't get 'sbt it:testOnly ... ' to work.

I've used sbt testOnly before with regular tests.

Can it be made to work for integration tests?

I've asked questions about sbt testOnly before. This time I will make an sbt 
notes wiki page.

Second issue: the build-test-rebuild-test cycle for integration tests is 
terribly long.

Can one shortcut the full 'sbt stage' operation that makes the doc jars and 
such, e.g., is there a sub-operation that just refreshes the code jars?


Mike Beckerle | Principal Engineer

[cid:055ceb22-3135-4a39-b05f-6f75286db035]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: release 3.1.0 critical bugs still outstanding

2021-05-10 Thread Beckerle, Mike
I agree we should do the release.

I am in the thick of debugging DAFFODIL-1422, but there's a bunch of 
refactoring here, 30 files touched, changes in diagnostic behavior, etc. 
Perhaps best to put it off until after the 3.1.0 release.





From: Steve Lawrence 
Sent: Monday, May 10, 2021 1:59 PM
To: dev@daffodil.apache.org 
Subject: Re: release 3.1.0 critical bugs still outstanding

We have fixed the later two mentioned issues. The current list of
critical issues is now:

DAFFODIL-1422: disallow doctype decls in all XML & XSD that we read in
DAFFODIL-2400: New SAX API causes performance degredations.
DAFFODIL-2473: Need bit-wise AND, OR, NOT, and shift operations

I agree the the first two can likely be postponed without issue. The
last one doesn't even seem critical to me, unless there are very
important formats that require this functions that I'm not aware of. I
suggest we also postpone that ticket as well.

If others agree, I think we are ready for the 3.1.0 release?

Does any want to volunteer to be the release manager? I've done it a
handful of times so don't mind, but it might be good to get others
experience depending on availability. By this point, the workflow is
pretty well documented here:

https://cwiki.apache.org/confluence/display/DAFFODIL/Release+Workflow


On 5/3/21 5:25 PM, Beckerle, Mike wrote:
> Of the 4 remaining "critical bugs or improvements" I think we should postpone 
> and release note these first two:
>
>   *
>
>   *   Improvement: https://issues.apache.org/jira/browse/DAFFODIL-2400 - New 
> SAX API causes performance degradations.
>  *It is a mystery why the SAX API is slower. The whole point of SAX 
> is lighter weight.
>   *   Improvement: https://issues.apache.org/jira/browse/DAFFODIL-1422 - 
> disallow doctype decls in all XML and XSD we read in.
>  *   Assigned to Mike Beckerle. Unlikely to be finished in time for 
> release 3.1.0. Substantial code refactoring to do this right.
>
> These next two seem rather important to fix:
>
>   *   Bug: https://issues.apache.org/jira/browse/DAFFODIL-2183 - Unparse 
> complex nilled element fails
>  *   There are data formats where I advised people a best-practice is to 
> use complex nilled elements to model a specific situation.
>   *   Bug: https://issues.apache.org/jira/browse/DAFFODIL-2399 - error 
> diagnostics output even though there is an infoset
>  *   This one is assigned to Steve Lawrence
>  *   Seems rather important. Was a user-reported issue I believe.
>
> The 5th critical ticket is for a new feature (bitwise and/or/xor, and shift 
> functions), so we can postpone that one.
>
> So only DAFFODIL-2183 really needs someone to take it on.
>
> 
> From: Interrante, John A (GE Research, US) 
> Sent: Monday, May 3, 2021 4:57 PM
> To: dev@daffodil.apache.org ...
>
> Are any of the 5 critical bugs (2 of which need developers to work on them) 
> going to hold up the 3.1.0 release?  The report doesn't say so, but I had the 
> impression you'd added the remaining critical bugs (which were unlikely to be 
> hit by people) to the 3.1.0 release notes so that the 3.1.0 release still 
> could go out.  If any critical bugs are holding up 3.1.0, please post links 
> to them so we can help if we have time.
>
> John
>
>
>



Re: Daffodil DAP debugger modules and repos

2021-05-07 Thread Beckerle, Mike
daffodil-vscode repo is now setup and working.

From: Beckerle, Mike 
Sent: Tuesday, May 4, 2021 6:07 PM
To: dev@daffodil.apache.org 
Subject: Re: Daffodil DAP debugger modules and repos

Grrr. I can't write to it however.

INFRA says it may just be an hour before the permissions propagate to it.

I hope to update this Wednesday, push a single file over there so people can 
fork it and get started.

I confirmed vscode is MIT license, which is Category A, as in "allowed".




____
From: Beckerle, Mike 
Sent: Tuesday, May 4, 2021 5:46 PM
To: dev@daffodil.apache.org 
Subject: Re: Daffodil DAP debugger modules and repos

Well I'd like to see this be in an Apache Daffodil repo.

In fact, I just created one. You can find it at

https://github.com/apache/daffodil-vscode

The DFDLSchemas is not directly analogous, as there are other DFDL 
implementations and numerous schemas there are created by others for use with 
those implementations. E.g., EDIFACT, iso8586, etc. It also all significantly 
pre-dates Apache Daffodil.





From: Adam Rosien 
Sent: Tuesday, May 4, 2021 5:36 PM
To: dev@daffodil.apache.org 
Subject: Daffodil DAP debugger modules and repos

I've been extending John's debugger prototype to support DAP, the debug
protocol supported by VS Code and other IDEs. There's an animated gif of
what the VS Code interaction looks like, where only the current schema
element and data position are relayed, at [1].

Now that we've made these first steps, we wanted the community's advice and
opinion about where the related code should live. Here is our initial
proposal:

* The VS Code extension would live in a separate repository,
`daffodil-vscode`. This is a common pattern with other extensions, and
would allow the extension to be released independently of daffodil itself.
However, I'm not sure what "organization" this would live under; this
situation is similar to auxiliary Daffodil repos like
https://github.com/DFDLSchemas.

* The Daffodil `Debugger` to DAP code *could* exist as a sub-module of the
main Daffodil project, say, `daffodil-dap`. We expect a lot of churn for
this code as we translate more and more of the Daffodil parsing state into
the DAP domain. There are a few new dependencies, like the java-debug
project that handles the DAP protocol, and helper code like cats and fs2
(for streaming).

That's the basics. We'd love to know if this fits or if you have some
better ideas.

.. Adam

[1] https://github.com/jw3/example-daffodil-debug/discussions/10


Re: Escape character parsing bug?

2021-05-06 Thread Beckerle, Mike
This sounds good to me. Less complexity therefore fewer tests is a good thing.

From: Adams, Joshua 
Sent: Wednesday, May 5, 2021 5:05 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

So, after making the change to throw a Schema Definition Error whenever a 
terminator or separator begins with the escapeCharacter or 
escapeEscapeCharacter, around half of our escape scenario tests fail as they 
were all trying to test these weird edge cases for dealing with delimiters that 
start with the escapeCharacter or escapeEscapeCharacter.  I'm guessing that 
most of these tests can just be purged after a review to make sure we aren't 
losing coverage (other than this scenario where we are now throwing an SDE).  
Just wanted to get some opinions before moving forward with this change.

Josh

From: Adams, Joshua 
Sent: Tuesday, May 4, 2021 12:44 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

I'll begin making the change to add an SDE for these then.  It seems that most 
of the escape scheme tests that weren't round tripping were cases like this.

Josh

On May 4, 2021 12:15 PM, "Beckerle, Mike"  wrote:
I asked Steve Hanson of IBM - other co-chair on DFDL workgroup, and one of the 
primaries on one of IBM's DFDL implementations, said that when he tries this 
situation with the escape character "/" matching the start of the separator, he 
gets an SDE.

It appears not to be part of the DFDL spec to call this out as an SDE, so that 
omission will likely become the first erratum to the DFDL v1.0 official final 
spec.



From: Adams, Joshua 
Sent: Monday, May 3, 2021 3:35 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

Thanks for running this up the chain so to speak.  I agree that an SDE would 
probably be best for situations like this as I wouldn't think any sort of sane 
data format would use a combination of separators/escape characters like this.

Josh
________
From: Beckerle, Mike 
Sent: Monday, May 3, 2021 3:32 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

So you have a separator the first char of which is the escape character.

Yikes. I think the DFDL spec should, ideally, make this an SDE. Feels entirely 
ambiguous to me.

The part of the spec you quote is quite problematic, but was updated by one 
word in the final DFDL Spec version.

Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter, respectively.

So breaking that into two independent statements:

  1.  An escapeCharacter is removed unless it is preceded by the escape-escape.
  2.  An escape-escape is removed unless it does not precede the escape 
character.

So (1) means an escape char that is floating around not in front of any 
delimiter is removed.
(2) means an escape-escape floating around not in front of any escape char, is 
preserved.

That still doesn't help with your specific issue. If a delimiter begins with 
the escapeCharacter, will that delimiter appearing in the data be interpreted 
as an escape character followed by the 2nd and subsequent characters of the 
delimiter? Or will the delimiter be recognized?

Consider dfdl:separator="/ // ///" with escapeCharacter="/" and 
escapeEscapeCharacter="/"

What takes priority, interpretation of escapeCharacters and 
escapeEscapeCharacters or recognizing delimiters?

I have posed this issue for consideration of the other DFDL workgroup members 
and I'll report back.


From: Adams, Joshua 
Sent: Monday, May 3, 2021 2:38 PM
To: dev@daffodil.apache.org 
Subject: Escape character parsing bug?

Consider the following schema:


  



  

  
  

  


We then have the following test case:
  

foo$$/;bar

  

  foo$/;bar

  

  

Shouldn't this parse as:

  foo$$
  bar


The spec says the following:
On parsing any in-scope terminating delimiter encountered in the data
is not interpreted as such when it is immediately preceded by the
dfdl:escapeCharacter (when not itself preceded by the
dfdl:escapeEscapeCharacter). Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter.

It seems to me that the '/;' terminator shouldn't be getting escaped in this 
case, but want to double check.

Josh




Re: Daffodil DAP debugger modules and repos

2021-05-04 Thread Beckerle, Mike
Grrr. I can't write to it however.

INFRA says it may just be an hour before the permissions propagate to it.

I hope to update this Wednesday, push a single file over there so people can 
fork it and get started.

I confirmed vscode is MIT license, which is Category A, as in "allowed".




____
From: Beckerle, Mike 
Sent: Tuesday, May 4, 2021 5:46 PM
To: dev@daffodil.apache.org 
Subject: Re: Daffodil DAP debugger modules and repos

Well I'd like to see this be in an Apache Daffodil repo.

In fact, I just created one. You can find it at

https://github.com/apache/daffodil-vscode

The DFDLSchemas is not directly analogous, as there are other DFDL 
implementations and numerous schemas there are created by others for use with 
those implementations. E.g., EDIFACT, iso8586, etc. It also all significantly 
pre-dates Apache Daffodil.





From: Adam Rosien 
Sent: Tuesday, May 4, 2021 5:36 PM
To: dev@daffodil.apache.org 
Subject: Daffodil DAP debugger modules and repos

I've been extending John's debugger prototype to support DAP, the debug
protocol supported by VS Code and other IDEs. There's an animated gif of
what the VS Code interaction looks like, where only the current schema
element and data position are relayed, at [1].

Now that we've made these first steps, we wanted the community's advice and
opinion about where the related code should live. Here is our initial
proposal:

* The VS Code extension would live in a separate repository,
`daffodil-vscode`. This is a common pattern with other extensions, and
would allow the extension to be released independently of daffodil itself.
However, I'm not sure what "organization" this would live under; this
situation is similar to auxiliary Daffodil repos like
https://github.com/DFDLSchemas.

* The Daffodil `Debugger` to DAP code *could* exist as a sub-module of the
main Daffodil project, say, `daffodil-dap`. We expect a lot of churn for
this code as we translate more and more of the Daffodil parsing state into
the DAP domain. There are a few new dependencies, like the java-debug
project that handles the DAP protocol, and helper code like cats and fs2
(for streaming).

That's the basics. We'd love to know if this fits or if you have some
better ideas.

.. Adam

[1] https://github.com/jw3/example-daffodil-debug/discussions/10


Re: Daffodil DAP debugger modules and repos

2021-05-04 Thread Beckerle, Mike
Well I'd like to see this be in an Apache Daffodil repo.

In fact, I just created one. You can find it at

https://github.com/apache/daffodil-vscode

The DFDLSchemas is not directly analogous, as there are other DFDL 
implementations and numerous schemas there are created by others for use with 
those implementations. E.g., EDIFACT, iso8586, etc. It also all significantly 
pre-dates Apache Daffodil.





From: Adam Rosien 
Sent: Tuesday, May 4, 2021 5:36 PM
To: dev@daffodil.apache.org 
Subject: Daffodil DAP debugger modules and repos

I've been extending John's debugger prototype to support DAP, the debug
protocol supported by VS Code and other IDEs. There's an animated gif of
what the VS Code interaction looks like, where only the current schema
element and data position are relayed, at [1].

Now that we've made these first steps, we wanted the community's advice and
opinion about where the related code should live. Here is our initial
proposal:

* The VS Code extension would live in a separate repository,
`daffodil-vscode`. This is a common pattern with other extensions, and
would allow the extension to be released independently of daffodil itself.
However, I'm not sure what "organization" this would live under; this
situation is similar to auxiliary Daffodil repos like
https://github.com/DFDLSchemas.

* The Daffodil `Debugger` to DAP code *could* exist as a sub-module of the
main Daffodil project, say, `daffodil-dap`. We expect a lot of churn for
this code as we translate more and more of the Daffodil parsing state into
the DAP domain. There are a few new dependencies, like the java-debug
project that handles the DAP protocol, and helper code like cats and fs2
(for streaming).

That's the basics. We'd love to know if this fits or if you have some
better ideas.

.. Adam

[1] https://github.com/jw3/example-daffodil-debug/discussions/10


Re: Escape character parsing bug?

2021-05-04 Thread Beckerle, Mike
I asked Steve Hanson of IBM - other co-chair on DFDL workgroup, and one of the 
primaries on one of IBM's DFDL implementations, said that when he tries this 
situation with the escape character "/" matching the start of the separator, he 
gets an SDE.

It appears not to be part of the DFDL spec to call this out as an SDE, so that 
omission will likely become the first erratum to the DFDL v1.0 official final 
spec.



From: Adams, Joshua 
Sent: Monday, May 3, 2021 3:35 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

Thanks for running this up the chain so to speak.  I agree that an SDE would 
probably be best for situations like this as I wouldn't think any sort of sane 
data format would use a combination of separators/escape characters like this.

Josh
____
From: Beckerle, Mike 
Sent: Monday, May 3, 2021 3:32 PM
To: dev@daffodil.apache.org 
Subject: Re: Escape character parsing bug?

So you have a separator the first char of which is the escape character.

Yikes. I think the DFDL spec should, ideally, make this an SDE. Feels entirely 
ambiguous to me.

The part of the spec you quote is quite problematic, but was updated by one 
word in the final DFDL Spec version.

Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter, respectively.

So breaking that into two independent statements:

  1.  An escapeCharacter is removed unless it is preceded by the escape-escape.
  2.  An escape-escape is removed unless it does not precede the escape 
character.

So (1) means an escape char that is floating around not in front of any 
delimiter is removed.
(2) means an escape-escape floating around not in front of any escape char, is 
preserved.

That still doesn't help with your specific issue. If a delimiter begins with 
the escapeCharacter, will that delimiter appearing in the data be interpreted 
as an escape character followed by the 2nd and subsequent characters of the 
delimiter? Or will the delimiter be recognized?

Consider dfdl:separator="/ // ///" with escapeCharacter="/" and 
escapeEscapeCharacter="/"

What takes priority, interpretation of escapeCharacters and 
escapeEscapeCharacters or recognizing delimiters?

I have posed this issue for consideration of the other DFDL workgroup members 
and I'll report back.


From: Adams, Joshua 
Sent: Monday, May 3, 2021 2:38 PM
To: dev@daffodil.apache.org 
Subject: Escape character parsing bug?

Consider the following schema:


  



  

  
  

  


We then have the following test case:
  

foo$$/;bar

  

  foo$/;bar

  

  

Shouldn't this parse as:

  foo$$
  bar


The spec says the following:
On parsing any in-scope terminating delimiter encountered in the data
is not interpreted as such when it is immediately preceded by the
dfdl:escapeCharacter (when not itself preceded by the
dfdl:escapeEscapeCharacter). Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter.

It seems to me that the '/;' terminator shouldn't be getting escaped in this 
case, but want to double check.

Josh




release 3.1.0 critical bugs still outstanding

2021-05-03 Thread Beckerle, Mike
Of the 4 remaining "critical bugs or improvements" I think we should postpone 
and release note these first two:

  *

  *   Improvement: https://issues.apache.org/jira/browse/DAFFODIL-2400 - New 
SAX API causes performance degradations.
 *It is a mystery why the SAX API is slower. The whole point of SAX is 
lighter weight.
  *   Improvement: https://issues.apache.org/jira/browse/DAFFODIL-1422 - 
disallow doctype decls in all XML and XSD we read in.
 *   Assigned to Mike Beckerle. Unlikely to be finished in time for release 
3.1.0. Substantial code refactoring to do this right.

These next two seem rather important to fix:

  *   Bug: https://issues.apache.org/jira/browse/DAFFODIL-2183 - Unparse 
complex nilled element fails
 *   There are data formats where I advised people a best-practice is to 
use complex nilled elements to model a specific situation.
  *   Bug: https://issues.apache.org/jira/browse/DAFFODIL-2399 - error 
diagnostics output even though there is an infoset
 *   This one is assigned to Steve Lawrence
 *   Seems rather important. Was a user-reported issue I believe.

The 5th critical ticket is for a new feature (bitwise and/or/xor, and shift 
functions), so we can postpone that one.

So only DAFFODIL-2183 really needs someone to take it on.


From: Interrante, John A (GE Research, US) 
Sent: Monday, May 3, 2021 4:57 PM
To: dev@daffodil.apache.org ...

Are any of the 5 critical bugs (2 of which need developers to work on them) 
going to hold up the 3.1.0 release?  The report doesn't say so, but I had the 
impression you'd added the remaining critical bugs (which were unlikely to be 
hit by people) to the 3.1.0 release notes so that the 3.1.0 release still could 
go out.  If any critical bugs are holding up 3.1.0, please post links to them 
so we can help if we have time.

John




draft may board report - due on 12th.

2021-05-03 Thread Beckerle, Mike
Here's what I have planned for the may board report.
Your feedback is welcome.

## Description:
Implementation of Data Format Description Language (DFDL) used to convert data
from native formats into more easily processed forms such as XML, JSON, or the
structures carried by data-processing fabrics.

## Issues:
There are no issues requiring board attention.

## Membership Data:
Apache Daffodil was founded 2021-02-16 (4 months ago)
There are currently 13 committers and 12 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:6.

Community changes, past quarter:
- No new PMC members (project graduated recently).
- No new committers were added.

## Project Activity:
This is our third board report since graduation to a TLP.

The project continues to move towards its first TLP release which will be 3.1.0.

The major features needed are in place including

* the first version of the C-code generator - a new lightweight,
  small-footprint C-code generator that we hope will attract new contributors
* raw validation output
* command-line debugger improvements

A small number (5) of critical bugs remain outstanding 2 of which need
developers to work on them.

## Community Health:
We have 3 new contributors who have expressed interest and started work.
Our hope is of course that they will stay engaged and become committers.

Our stats look healthier this month across the board.

Mike Beckerle | Principal Engineer

[cid:efd57f71-bbee-4ca5-b7e5-01ec853c7eef]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: Escape character parsing bug?

2021-05-03 Thread Beckerle, Mike
So you have a separator the first char of which is the escape character.

Yikes. I think the DFDL spec should, ideally, make this an SDE. Feels entirely 
ambiguous to me.

The part of the spec you quote is quite problematic, but was updated by one 
word in the final DFDL Spec version.

Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter, respectively.

So breaking that into two independent statements:

  1.  An escapeCharacter is removed unless it is preceded by the escape-escape.
  2.  An escape-escape is removed unless it does not precede the escape 
character.

So (1) means an escape char that is floating around not in front of any 
delimiter is removed.
(2) means an escape-escape floating around not in front of any escape char, is 
preserved.

That still doesn't help with your specific issue. If a delimiter begins with 
the escapeCharacter, will that delimiter appearing in the data be interpreted 
as an escape character followed by the 2nd and subsequent characters of the 
delimiter? Or will the delimiter be recognized?

Consider dfdl:separator="/ // ///" with escapeCharacter="/" and 
escapeEscapeCharacter="/"

What takes priority, interpretation of escapeCharacters and 
escapeEscapeCharacters or recognizing delimiters?

I have posed this issue for consideration of the other DFDL workgroup members 
and I'll report back.


From: Adams, Joshua 
Sent: Monday, May 3, 2021 2:38 PM
To: dev@daffodil.apache.org 
Subject: Escape character parsing bug?

Consider the following schema:


  



  

  
  

  


We then have the following test case:
  

foo$$/;bar

  

  foo$/;bar

  

  

Shouldn't this parse as:

  foo$$
  bar


The spec says the following:
On parsing any in-scope terminating delimiter encountered in the data
is not interpreted as such when it is immediately preceded by the
dfdl:escapeCharacter (when not itself preceded by the
dfdl:escapeEscapeCharacter). Occurrences of the
dfdl:escapeCharacter and dfdl:escapeEscapeCharacter are removed
from the data, unless the dfdl:escapeCharacter is preceded by the
dfdl:escapeEscapeCharacter, or the dfdl:escapeEscapeCharacter
does not precede the dfdl:escapeCharacter.

It seems to me that the '/;' terminator shouldn't be getting escaped in this 
case, but want to double check.

Josh




Re: flakey windows CI build? Or real issue?

2021-04-29 Thread Beckerle, Mike
Wow, they even coined a term for this macro stuff... "check-enabled-idiom".

It says Apache License 2.0 in the License file.

From: Steve Lawrence 
Sent: Thursday, April 29, 2021 3:51 PM
To: dev@daffodil.apache.org 
Subject: Re: flakey windows CI build? Or real issue?

Good point. It looks like scala-logging has that macro stuff, and is a
wrapper for SLF4J so I would assume could be easily used by Java
applications:

https://github.com/lightbend/scala-logging

I haven't looked at license/dependency of that, but something like that
might ork.

On 4/29/21 3:31 PM, Beckerle, Mike wrote:
> I have no problem using someone else's logging infrastructure.
>
> The only sort-of-requirement is I've always hated the overhead of logging 
> because to create a good log message you end up doing a bunch of work and 
> then you pass that to the logger which says "not at the log level where that 
> is needed", and throws it all away.
>
> The reason for the logging macro is to lower the overhead so that logging like
>
> log(SomeLevel, formatStringExpr, arg1Expr, arg2Expr,)
>
> imagine those "...Expr" things are in fact expressions, perhaps with some 
> cost to lookup the offending things etc. They may access lazy vals that have 
> to be computed, for example.
>
> You really want this to behave as if this was what was written:
>
> if (SomeLevel >= LoggingLevel)
>   log(formatStringExpr, arg1Expr, arg2Expr, )
>
> So that none of the cost of computing the arg expressions is encountered 
> unless you are at a log level where they are needed.
>
> That's what the macro does. Just hoists the if test above the evaluation of 
> all those expressions.
>
> We can certainly still do that even if the underlying logger is one of the 
> conventional ones popular in the java world.
>
>
> 
> From: Steve Lawrence 
> Sent: Wednesday, April 28, 2021 8:22 AM
> To: dev@daffodil.apache.org 
> Subject: Re: flakey windows CI build? Or real issue?
>
> Maybe we should consider dropping our own logging implementation and use
> some existing logging library. Other people have put a lot more time and
> thought into logging than we have. And I don't think Daffodil has any
> special logging requirements that other loggers don't already have.
>
> Thoughts?
>
>
> On 4/27/21 7:28 PM, Beckerle, Mike wrote:
>> Logging is highly suspicious for race conditions to me.
>>
>> This whole design is completely non-thread safe, and just doesn't make 
>> sense. I think "with Logging" was just copied as a pattern from place to 
>> place.
>>
>> I just created https://issues.apache.org/jira/browse/DAFFODIL-2510 for this 
>> issue.
>> 
>> From: Beckerle, Mike 
>> Sent: Tuesday, April 27, 2021 3:28 PM
>> To: dev@daffodil.apache.org 
>> Subject: Re: flakey windows CI build? Or real issue?
>>
>> This one line:
>>
>> [error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
>> expected:<0> but was:<1>, took 0.307 sec
>>
>> For that test to fail an assertEquals, but only on one platform,... and it 
>> is not reproducible. Is very disconcerting.
>>
>> The test has exactly 3 assertEquals that compare against 0.
>>
>>   @Test
>>   def testScalaAPI2(): Unit = {
>> val lw = new LogWriterForSAPITest()
>>
>> Daffodil.setLogWriter(lw)
>> Daffodil.setLoggingLevel(LogLevel.Info)
>>
>> ...
>>
>> val res = dp.parse(input, outputter)
>>
>>...
>> assertEquals(0, lw.errors.size)
>> assertEquals(0, lw.warnings.size)
>> assertEquals(0, lw.others.size)
>>
>> // reset the global logging state
>> Daffodil.setLogWriter(new ConsoleLogWriter())
>> Daffodil.setLoggingLevel(LogLevel.Info)
>>   }
>>
>> So this test is failing sporadically because of something being written to 
>> the logWriter (lw) that wasn't before.
>>
>> 
>> From: Interrante, John A (GE Research, US) 
>> Sent: Tuesday, April 27, 2021 2:47 PM
>> To: dev@daffodil.apache.org 
>> Subject: flakey windows CI build? Or real issue?
>>
>> Once you drill down into and expand the "Run Unit Tests" log, GitHub lets 
>> you search that log with a magnifying lens icon and input search text box 
>> above the log.  Searching for "failed:" makes it easier to find the specific 
>> failures.  I found one failure and three warnings:
>>
>> [error] Test org.apache.daffodil.example.Tes

Re: flakey windows CI build? Or real issue?

2021-04-29 Thread Beckerle, Mike
I have no problem using someone else's logging infrastructure.

The only sort-of-requirement is I've always hated the overhead of logging 
because to create a good log message you end up doing a bunch of work and then 
you pass that to the logger which says "not at the log level where that is 
needed", and throws it all away.

The reason for the logging macro is to lower the overhead so that logging like

log(SomeLevel, formatStringExpr, arg1Expr, arg2Expr,)

imagine those "...Expr" things are in fact expressions, perhaps with some cost 
to lookup the offending things etc. They may access lazy vals that have to be 
computed, for example.

You really want this to behave as if this was what was written:

if (SomeLevel >= LoggingLevel)
  log(formatStringExpr, arg1Expr, arg2Expr, )

So that none of the cost of computing the arg expressions is encountered unless 
you are at a log level where they are needed.

That's what the macro does. Just hoists the if test above the evaluation of all 
those expressions.

We can certainly still do that even if the underlying logger is one of the 
conventional ones popular in the java world.



From: Steve Lawrence 
Sent: Wednesday, April 28, 2021 8:22 AM
To: dev@daffodil.apache.org 
Subject: Re: flakey windows CI build? Or real issue?

Maybe we should consider dropping our own logging implementation and use
some existing logging library. Other people have put a lot more time and
thought into logging than we have. And I don't think Daffodil has any
special logging requirements that other loggers don't already have.

Thoughts?


On 4/27/21 7:28 PM, Beckerle, Mike wrote:
> Logging is highly suspicious for race conditions to me.
>
> This whole design is completely non-thread safe, and just doesn't make sense. 
> I think "with Logging" was just copied as a pattern from place to place.
>
> I just created https://issues.apache.org/jira/browse/DAFFODIL-2510 for this 
> issue.
> 
> From: Beckerle, Mike 
> Sent: Tuesday, April 27, 2021 3:28 PM
> To: dev@daffodil.apache.org 
> Subject: Re: flakey windows CI build? Or real issue?
>
> This one line:
>
> [error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
> expected:<0> but was:<1>, took 0.307 sec
>
> For that test to fail an assertEquals, but only on one platform,... and it is 
> not reproducible. Is very disconcerting.
>
> The test has exactly 3 assertEquals that compare against 0.
>
>   @Test
>   def testScalaAPI2(): Unit = {
> val lw = new LogWriterForSAPITest()
>
> Daffodil.setLogWriter(lw)
> Daffodil.setLoggingLevel(LogLevel.Info)
>
> ...
>
> val res = dp.parse(input, outputter)
>
>...
> assertEquals(0, lw.errors.size)
> assertEquals(0, lw.warnings.size)
> assertEquals(0, lw.others.size)
>
> // reset the global logging state
> Daffodil.setLogWriter(new ConsoleLogWriter())
> Daffodil.setLoggingLevel(LogLevel.Info)
>   }
>
> So this test is failing sporadically because of something being written to 
> the logWriter (lw) that wasn't before.
>
> 
> From: Interrante, John A (GE Research, US) 
> Sent: Tuesday, April 27, 2021 2:47 PM
> To: dev@daffodil.apache.org 
> Subject: flakey windows CI build? Or real issue?
>
> Once you drill down into and expand the "Run Unit Tests" log, GitHub lets you 
> search that log with a magnifying lens icon and input search text box above 
> the log.  Searching for "failed:" makes it easier to find the specific 
> failures.  I found one failure and three warnings:
>
> [error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
> expected:<0> but was:<1>, took 0.307 sec
>
> [warn] Test assumption in test 
> org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_1 failed: 
> org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
> 'test_sep_ssp_never_1' not compatible with implementation., took 0.033 sec
> [warn] Test assumption in test 
> org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_3 failed: 
> org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
> 'test_sep_ssp_never_3' not compatible with implementation., took 0.005 sec
> [warn] Test assumption in test 
> org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_4 failed: 
> org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
> 'test_sep_ssp_never_4' not compatible with implementation., took 0.003 sec
>
> Your previous run failed in the Windows Java 11 build's Compile step with a 
> http 504 error when sbt was trying to fetch artifacts:
>
> [error] 
> lmco

need 2nd reviewer on PR

2021-04-29 Thread Beckerle, Mike
https://github.com/apache/daffodil/pull/539

Needs a second reviewer.

I added some "Highlights" comments to the files diffs to help you surf the 
deltas more effectively.

This fixes an issue a user was having doing streaming reads of messages.

Mike Beckerle | Principal Engineer

[cid:2f6f0715-6c4b-4778-8cfb-94509b2b116c]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: flakey windows CI build? Or real issue?

2021-04-27 Thread Beckerle, Mike
Logging is highly suspicious for race conditions to me.

This whole design is completely non-thread safe, and just doesn't make sense. I 
think "with Logging" was just copied as a pattern from place to place.

I just created https://issues.apache.org/jira/browse/DAFFODIL-2510 for this 
issue.
____
From: Beckerle, Mike 
Sent: Tuesday, April 27, 2021 3:28 PM
To: dev@daffodil.apache.org 
Subject: Re: flakey windows CI build? Or real issue?

This one line:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
expected:<0> but was:<1>, took 0.307 sec

For that test to fail an assertEquals, but only on one platform,... and it is 
not reproducible. Is very disconcerting.

The test has exactly 3 assertEquals that compare against 0.

  @Test
  def testScalaAPI2(): Unit = {
val lw = new LogWriterForSAPITest()

Daffodil.setLogWriter(lw)
Daffodil.setLoggingLevel(LogLevel.Info)

...

val res = dp.parse(input, outputter)

   ...
assertEquals(0, lw.errors.size)
assertEquals(0, lw.warnings.size)
assertEquals(0, lw.others.size)

// reset the global logging state
Daffodil.setLogWriter(new ConsoleLogWriter())
Daffodil.setLoggingLevel(LogLevel.Info)
  }

So this test is failing sporadically because of something being written to the 
logWriter (lw) that wasn't before.


From: Interrante, John A (GE Research, US) 
Sent: Tuesday, April 27, 2021 2:47 PM
To: dev@daffodil.apache.org 
Subject: flakey windows CI build? Or real issue?

Once you drill down into and expand the "Run Unit Tests" log, GitHub lets you 
search that log with a magnifying lens icon and input search text box above the 
log.  Searching for "failed:" makes it easier to find the specific failures.  I 
found one failure and three warnings:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
expected:<0> but was:<1>, took 0.307 sec

[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_1 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_1' not compatible with implementation., took 0.033 sec
[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_3 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_3' not compatible with implementation., took 0.005 sec
[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_4 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_4' not compatible with implementation., took 0.003 sec

Your previous run failed in the Windows Java 11 build's Compile step with a 
http 504 error when sbt was trying to fetch artifacts:

[error] 
lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: 
Error fetching artifacts:
[error] 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar:
 download error: Caught java.io.IOException: Server returned HTTP response 
code: 504 for URL: 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar
 (Server returned HTTP response code: 504 for URL: 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar)
 while downloading 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar

That error probably is just a flaky network or server problem.

John

-Original Message-
From: Steve Lawrence 
Sent: Tuesday, April 27, 2021 2:17 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: flakey windows CI build? Or real issue?

I haven't seen failures in tests in a while, only thing I've noticed is GitHub 
actions just stalling with no output.

In the linked PR, I see the error:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2
failed: expected:<0> but was:<1>, took 0.307 sec

I wonder if these isAtEnd changes have introduced a race condition, or made an 
existing race condition more likely to get hit?

On 4/27/21 2:13 PM, Beckerle, Mike wrote:
> My PR keeps failing to build on Windows e.g., This failed the windows
> java8 build:
> https://github.com/mbeckerle/daffodil/actions/runs/789865909
> <https://github.com/mbeckerle/daffodil/actions/runs/789865909>
>
> Previously today it failed the windows java 11 build.
>
> The errors were different. Earlier today it was in daffodil-io, the
> linked checks above it's in daffodil-sapi.
>
> In neither case is there an [error] identifying the specific test
> failing. Only

Re: codecov - annotation to suppress false positives?

2021-04-27 Thread Beckerle, Mike
Nevermind. I was missing the trailing $ on my annotations. They're supposed to 
be // $COVERAGE-OFF$ and // $COVERAGE-ON$

From: Beckerle, Mike 
Sent: Tuesday, April 27, 2021 5:07 PM
To: dev@daffodil.apache.org 
Subject: Re: codecov - annotation to suppress false positives?

The codecov suppression using // $COVERAGE-OFF/ON did not work.

https://github.com/apache/daffodil/pull/539/files#diff-3fd1e3ba2fc61e010a380dd6ae37d03e861205f391bf5cd110e54a7c912d2067

When I view it, it shows a codecov warning, right in the middle of a block 
surrounded by the Coverage off/on comments.

So that suppression of the codecov report for these annotations did not work.

Is this a feature we need to enable in codecov?



From: Interrante, John A (GE Research, US) 
Sent: Tuesday, April 27, 2021 1:52 PM
To: dev@daffodil.apache.org 
Subject: codecov - annotation to suppress false positives?

I think you'd want "informational" mode, not "only_pulls".  It looks like the 
latter might lose the information from merged commits:

only_pulls
Only post a status to pull requests, defaults to false. If true no 
status will be posted for commits not on a pull request.

If code coverage jumps well above the 80% cutoff once DAFFODIL-2509 gets done, 
we'll probably want to stop setting "informational" mode.

-Original Message-
From: Steve Lawrence 
Sent: Tuesday, April 27, 2021 1:32 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: codecov - annotation to suppress false positives?

Thanks!

Related to codecov, does anyone have any thoughts on changing codecov.io 
settings so the GitHub actions are only in the "informational" mode:

  https://docs.codecov.io/docs/commit-status#informational

This way if there are any missing lines of code, it won't fail the check. I 
think we would still get the inline notices saying that a line has missing 
coverage, but a PR will still show the check as passing, and commits will also 
show a pass.

Alternatively we could set the "only_pulls" setting:

  https://docs.codecov.io/docs/commit-status#only_pulls

This way PR codecov checks can still fail the check if not sufficiently 
covered, but if we decide that's fine and merge it then it won't cause the 
merged commit to fail the check.


On 4/27/21 1:20 PM, Beckerle, Mike wrote:
> Created https://issues.apache.org/jira/browse/DAFFODIL-2509
> about adding these coverage exceptions uniformly for all the Assert.xyzzy 
> where it is applicable.
> 
> From: Steve Lawrence 
> Sent: Tuesday, April 27, 2021 12:45 PM
> To: dev@daffodil.apache.org 
> Subject: Re: codecov - annotation to suppress false positives?
>
> We use the sbt-scoverage plugin for generating coverage measurements
> before sending them to codecov.io for display. It looks like this does
> have a way exclude packages and sections of code:
>
> https://github.com/scoverage/sbt-scoverage#exclude-classes-and-package
> s
>
> So we could maybe do something like:
>
>   foo match {
> case real1 => ...
> case real2 => ... r
> // $COVERAGE-OFF
> case thingy =>  {
>   Assert.invariantFailed(".msg...")
> }
> // $COVERAGE-ON
>   }
>
> To exclude the entire case that should never be hit?
>
>
> On 4/27/21 10:29 AM, Beckerle, Mike wrote:
>> We have assertions like:
>>
>> foo match {
>> ... real cases 
>> case thingy =>  Assert.invariantFailed(".msg...")
>> }
>>
>> The same thing can happen with if-then-else logic obviously where you
>> make a decision, and some paths through the logic can't occur.
>>
>> These always get marked as non-covered, because by nature they're
>> never supposed to happen.
>>
>> Is there a structured comment or some way to tell codecov that this
>> is ok, and not to issue a warning about this line?
>>
>> Mike Beckerle | Principal Engineer
>>
>> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>>
>> P +1-781-330-0412
>>
>
>



Re: codecov - annotation to suppress false positives?

2021-04-27 Thread Beckerle, Mike
The codecov suppression using // $COVERAGE-OFF/ON did not work.

https://github.com/apache/daffodil/pull/539/files#diff-3fd1e3ba2fc61e010a380dd6ae37d03e861205f391bf5cd110e54a7c912d2067

When I view it, it shows a codecov warning, right in the middle of a block 
surrounded by the Coverage off/on comments.

So that suppression of the codecov report for these annotations did not work.

Is this a feature we need to enable in codecov?



From: Interrante, John A (GE Research, US) 
Sent: Tuesday, April 27, 2021 1:52 PM
To: dev@daffodil.apache.org 
Subject: codecov - annotation to suppress false positives?

I think you'd want "informational" mode, not "only_pulls".  It looks like the 
latter might lose the information from merged commits:

only_pulls
Only post a status to pull requests, defaults to false. If true no 
status will be posted for commits not on a pull request.

If code coverage jumps well above the 80% cutoff once DAFFODIL-2509 gets done, 
we'll probably want to stop setting "informational" mode.

-Original Message-
From: Steve Lawrence 
Sent: Tuesday, April 27, 2021 1:32 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: codecov - annotation to suppress false positives?

Thanks!

Related to codecov, does anyone have any thoughts on changing codecov.io 
settings so the GitHub actions are only in the "informational" mode:

  https://docs.codecov.io/docs/commit-status#informational

This way if there are any missing lines of code, it won't fail the check. I 
think we would still get the inline notices saying that a line has missing 
coverage, but a PR will still show the check as passing, and commits will also 
show a pass.

Alternatively we could set the "only_pulls" setting:

  https://docs.codecov.io/docs/commit-status#only_pulls

This way PR codecov checks can still fail the check if not sufficiently 
covered, but if we decide that's fine and merge it then it won't cause the 
merged commit to fail the check.


On 4/27/21 1:20 PM, Beckerle, Mike wrote:
> Created https://issues.apache.org/jira/browse/DAFFODIL-2509
> about adding these coverage exceptions uniformly for all the Assert.xyzzy 
> where it is applicable.
> 
> From: Steve Lawrence 
> Sent: Tuesday, April 27, 2021 12:45 PM
> To: dev@daffodil.apache.org 
> Subject: Re: codecov - annotation to suppress false positives?
>
> We use the sbt-scoverage plugin for generating coverage measurements
> before sending them to codecov.io for display. It looks like this does
> have a way exclude packages and sections of code:
>
> https://github.com/scoverage/sbt-scoverage#exclude-classes-and-package
> s
>
> So we could maybe do something like:
>
>   foo match {
> case real1 => ...
> case real2 => ... r
> // $COVERAGE-OFF
> case thingy =>  {
>   Assert.invariantFailed(".msg...")
> }
> // $COVERAGE-ON
>   }
>
> To exclude the entire case that should never be hit?
>
>
> On 4/27/21 10:29 AM, Beckerle, Mike wrote:
>> We have assertions like:
>>
>> foo match {
>> ... real cases 
>> case thingy =>  Assert.invariantFailed(".msg...")
>> }
>>
>> The same thing can happen with if-then-else logic obviously where you
>> make a decision, and some paths through the logic can't occur.
>>
>> These always get marked as non-covered, because by nature they're
>> never supposed to happen.
>>
>> Is there a structured comment or some way to tell codecov that this
>> is ok, and not to issue a warning about this line?
>>
>> Mike Beckerle | Principal Engineer
>>
>> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>>
>> P +1-781-330-0412
>>
>
>



Re: flakey windows CI build? Or real issue?

2021-04-27 Thread Beckerle, Mike
This one line:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
expected:<0> but was:<1>, took 0.307 sec

For that test to fail an assertEquals, but only on one platform,... and it is 
not reproducible. Is very disconcerting.

The test has exactly 3 assertEquals that compare against 0.

  @Test
  def testScalaAPI2(): Unit = {
val lw = new LogWriterForSAPITest()

Daffodil.setLogWriter(lw)
Daffodil.setLoggingLevel(LogLevel.Info)

...

val res = dp.parse(input, outputter)

   ...
assertEquals(0, lw.errors.size)
assertEquals(0, lw.warnings.size)
assertEquals(0, lw.others.size)

// reset the global logging state
Daffodil.setLogWriter(new ConsoleLogWriter())
Daffodil.setLoggingLevel(LogLevel.Info)
  }

So this test is failing sporadically because of something being written to the 
logWriter (lw) that wasn't before.


From: Interrante, John A (GE Research, US) 
Sent: Tuesday, April 27, 2021 2:47 PM
To: dev@daffodil.apache.org 
Subject: flakey windows CI build? Or real issue?

Once you drill down into and expand the "Run Unit Tests" log, GitHub lets you 
search that log with a magnifying lens icon and input search text box above the 
log.  Searching for "failed:" makes it easier to find the specific failures.  I 
found one failure and three warnings:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2 failed: 
expected:<0> but was:<1>, took 0.307 sec

[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_1 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_1' not compatible with implementation., took 0.033 sec
[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_3 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_3' not compatible with implementation., took 0.005 sec
[warn] Test assumption in test 
org.apache.daffodil.usertests.TestSepTests.test_sep_ssp_never_4 failed: 
org.junit.AssumptionViolatedException: (Implementation: daffodil) Test 
'test_sep_ssp_never_4' not compatible with implementation., took 0.003 sec

Your previous run failed in the Windows Java 11 build's Compile step with a 
http 504 error when sbt was trying to fetch artifacts:

[error] 
lmcoursier.internal.shaded.coursier.error.FetchError$DownloadingArtifacts: 
Error fetching artifacts:
[error] 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar:
 download error: Caught java.io.IOException: Server returned HTTP response 
code: 504 for URL: 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar
 (Server returned HTTP response code: 504 for URL: 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar)
 while downloading 
https://repo.scala-sbt.org/scalasbt/sbt-plugin-releases/com.typesafe.sbt/sbt-native-packager/scala_2.12/sbt_1.0/1.8.1/jars/sbt-native-packager.jar

That error probably is just a flaky network or server problem.

John

-Original Message-
From: Steve Lawrence 
Sent: Tuesday, April 27, 2021 2:17 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: flakey windows CI build? Or real issue?

I haven't seen failures in tests in a while, only thing I've noticed is GitHub 
actions just stalling with no output.

In the linked PR, I see the error:

[error] Test org.apache.daffodil.example.TestScalaAPI.testScalaAPI2
failed: expected:<0> but was:<1>, took 0.307 sec

I wonder if these isAtEnd changes have introduced a race condition, or made an 
existing race condition more likely to get hit?

On 4/27/21 2:13 PM, Beckerle, Mike wrote:
> My PR keeps failing to build on Windows e.g., This failed the windows
> java8 build:
> https://github.com/mbeckerle/daffodil/actions/runs/789865909
> <https://github.com/mbeckerle/daffodil/actions/runs/789865909>
>
> Previously today it failed the windows java 11 build.
>
> The errors were different. Earlier today it was in daffodil-io, the
> linked checks above it's in daffodil-sapi.
>
> In neither case is there an [error] identifying the specific test
> failing. Only a summary at the end indicating there were failures in that 
> module.
>
> Is any of this expected behavior? I've seen mostly all 6 standard CI
> checks working of late on others' PRs.
>
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>
> P +1-781-330-0412
>



flakey windows CI build? Or real issue?

2021-04-27 Thread Beckerle, Mike
My PR keeps failing to build on Windows e.g.,
This failed the windows java8 build: 
https://github.com/mbeckerle/daffodil/actions/runs/789865909

Previously today it failed the windows java 11 build.

The errors were different. Earlier today it was in daffodil-io, the linked 
checks above it's in daffodil-sapi.

In neither case is there an [error] identifying the specific test failing. Only 
a summary at the end indicating there were failures in that module.

Is any of this expected behavior? I've seen mostly all 6 standard CI checks 
working of late on others' PRs.


Mike Beckerle | Principal Engineer

[cid:7a5d42e3-fb63-4c55-82b5-6b13e0f69c4c]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: codecov - annotation to suppress false positives?

2021-04-27 Thread Beckerle, Mike
Created https://issues.apache.org/jira/browse/DAFFODIL-2509
about adding these coverage exceptions uniformly for all the Assert.xyzzy where 
it is applicable.

From: Steve Lawrence 
Sent: Tuesday, April 27, 2021 12:45 PM
To: dev@daffodil.apache.org 
Subject: Re: codecov - annotation to suppress false positives?

We use the sbt-scoverage plugin for generating coverage measurements
before sending them to codecov.io for display. It looks like this does
have a way exclude packages and sections of code:

https://github.com/scoverage/sbt-scoverage#exclude-classes-and-packages

So we could maybe do something like:

  foo match {
case real1 => ...
case real2 => ... r
// $COVERAGE-OFF
case thingy =>  {
  Assert.invariantFailed(".msg...")
}
// $COVERAGE-ON
  }

To exclude the entire case that should never be hit?


On 4/27/21 10:29 AM, Beckerle, Mike wrote:
> We have assertions like:
>
> foo match {
> ... real cases 
> case thingy =>  Assert.invariantFailed(".msg...")
> }
>
> The same thing can happen with if-then-else logic obviously where you make a
> decision, and some paths
> through the logic can't occur.
>
> These always get marked as non-covered, because by nature they're never 
> supposed
> to happen.
>
> Is there a structured comment or some way to tell codecov that this is ok, and
> not to issue a warning about this line?
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>
> P +1-781-330-0412
>



Recent push to PR did not run CI tests

2021-04-27 Thread Beckerle, Mike
Any ideas why susmita's latest updated did not cause CI to run?

https://github.com/apache/daffodil/pull/490/checks


Mike Beckerle | Principal Engineer

[cid:dc68b4f3-1fa0-4ad3-8b90-e4adfab7b800]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



codecov - annotation to suppress false positives?

2021-04-27 Thread Beckerle, Mike
We have assertions like:

foo match {
... real cases 
case thingy =>  Assert.invariantFailed(".msg...")
}

The same thing can happen with if-then-else logic obviously where you make a 
decision, and some paths
through the logic can't occur.

These always get marked as non-covered, because by nature they're never 
supposed to happen.

Is there a structured comment or some way to tell codecov that this is ok, and 
not to issue a warning about this line?

Mike Beckerle | Principal Engineer

[cid:28249661-443c-4112-b447-c1f40794f06d]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: Forgot to squash commits

2021-04-21 Thread Beckerle, Mike
I decided to force-push them, but just in case I do have the branch with the 
other 3 commits saved and we could recreate the other 3-commit scenario if 
necessary.

So the master is now what it is supposed to be. The bug fix (which was just 
adding test cases) having been squashed from 3 commits into 1 (our usual 
workflow practice.)

Outstanding pull requests still have to rebase on top, and conflict detection 
should still do the right thing. I checked a couple PRs and they still show 
no-conflicts with the base.


From: John Wass 
Sent: Wednesday, April 21, 2021 4:26 PM
To: dev@daffodil.apache.org 
Subject: Re: Forgot to squash commits

I'd let them be.

On Wed, Apr 21, 2021 at 4:13 PM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> I ended up committing 3 tiny commits to master, forgot to squash them.
>
> Should I fix this by force push?
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com 
> P +1-781-330-0412
>
> Connect with us!
>
> <https://www.linkedin.com/company/owlcyberdefense/>
> <https://twitter.com/owlcyberdefense>
>
> <https://owlcyberdefense.com/resources/events/>
>
>
>
> The information contained in this transmission is for the personal and
> confidential use of the individual or entity to which it is addressed. If
> the reader is not the intended recipient, you are hereby notified that any
> review, dissemination, or copying of this communication is strictly
> prohibited. If you have received this transmission in error, please notify
> the sender immediately
>


Forgot to squash commits

2021-04-21 Thread Beckerle, Mike
I ended up committing 3 tiny commits to master, forgot to squash them.

Should I fix this by force push?

Mike Beckerle | Principal Engineer

[cid:7cdf72d7-2aa7-4d28-b9d3-bc7fa7c525a1]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:30d2c816-c306-4257-867b-9c6395fea667][cid:153579d6-aa11-448c-ab90-fec19584e663]

[cid:d267645d-3247-4fab-97fe-fd60a4dd0301]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


New JIRA component "Back End C-Generator". Also Issue Types.

2021-04-21 Thread Beckerle, Mike
I added this component as we're going to be merging this stuff onto master soon 
enough.

JIRA tickets associated with this new back end should use this component.

I also wanted to mention that JIRA offers us a huge list of issue types.

I'd like to recommend we stick with exactly 3: Bug, Improvement, New Feature.


Mike Beckerle | Principal Engineer

[cid:8c0ba067-d17a-42e0-bba0-62142ad43964]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:de89acb0-0c26-4e83-8b68-4e66ceb7dcea][cid:ab3efb76-6e11-45e0-9665-d23ed421ee87]

[cid:53f2062f-398b-437a-be33-e9bdcf7b2b07]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Re: editconfig

2021-04-21 Thread Beckerle, Mike
Also, did we decide on the autocrlf or force just LF thing?

I think I am in the force LF-only camp now. I got prompted the other day by 
IntellJ that I was commiting files that contained CRLFs.

Developers on windows just must configure tools to always use just LF line 
endings for the Daffodil project.

From: John Wass 
Sent: Wednesday, April 21, 2021 11:46 AM
To: dev@daffodil.apache.org 
Subject: Re: editconfig

> As a Scala project, however, how about using Scalafmt?

I'm in favor of scalafmt also.

> But I assume scalafmt won't cover other files like XML/schema/tdml/text
files.

Take a look at https://github.com/diffplug/spotless

Spotless says it could support all of those, and a quick search says the
SBT plugin is backed by scalafmt.

(I haven't used Spotless, just saw it today and thought of this thread)



On Mon, Apr 19, 2021 at 3:17 PM Interrante, John A (GE Research, US) <
john.interra...@ge.com> wrote:

> I concur with Steve; we're going to need both a scalafmt configuration
> file and an .editorconfig file to cover all source code files unless the
> day comes when scalafmt understands .editorconfig and we're happy with
> scalafmt's default formatting options.
>
> Daffodil's existing code style is supposed to be very close to
> scalariform's default formatting options.  Does anyone know how different
> scalafmt's default formatting options are from scalariform's?  If they're
> not that different, eventually we might end up with just .editorconfig.
>
> -Original Message-
> From: Adam Rosien 
> Sent: Monday, April 19, 2021 12:16 PM
> To: dev@daffodil.apache.org
> Subject: EXT: Re: editconfig
>
> Ah, thanks for the extra context. I'll check out the JIRA issue.
>
> FYI there's an editorconfig integration issue open for scalafmt:
> https://github.com/scalameta/scalafmt/issues/1458.
>
> .. Adam
>
> On Mon, Apr 19, 2021 at 8:51 AM Steve Lawrence 
> wrote:
>
> > As long as scalafmt covers everything editconfig supports and the
> > popular IDE's support it, we'd probably get better results for our
> > scala files. But I assume scalafmt won't cover other files like
> > XML/schema/tdml/text files. We might need a combination of the two to
> > cover all files?
> >
> > See https://issues.apache.org/jira/browse/DAFFODIL-2133 for related
> issue.
> >
> > - Steve
> >
> > On 4/19/21 11:37 AM, Adam Rosien wrote:
> > > As a Scala project, however, how about using Scalafmt? [1] It's
> > > become standard in all the projects I've been involved with; it's
> > > supported by
> > the
> > > language creators and matches the previously mentioned features.
> > >
> > > .. Adam
> > >
> > > [1] https://scalameta.org/scalafmt/
> > >
> > > On Mon, Apr 19, 2021 at 8:20 AM Interrante, John A (GE Research, US)
> > > < john.interra...@ge.com> wrote:
> > >
> > >> I agree, an .editorconfig file at the root of daffodil coupled with
> > >> IDE plugins (some IDEs such as IDEA already support .editorconfig
> > >> without
> > any
> > >> plugin needed) could autoconfigure the following IDE settings
> > automatically
> > >> (if we felt we needed to specify all of these settings):
> > >>
> > >> root = true
> > >> # All files (risky - could change bin/dat files inadvertently) [*]
> > >> end_of_line = lf charset = utf-8 trim_trailing_whitespace = true
> > >> insert_final_newline = true indent_style = space indent_size = 2 #
> > >> Can narrow scope to only source code files [*.{java,scala,xml}]
> > >> indent_style = space indent_size = 2
> > >>
> > >> EditorConfig plugins format only newly typed lines with these
> > >> settings; they do not reformat existing files, meaning only files
> > >> actually
> > changed by
> > >> one's commit will be affected by these settings.  There are
> > >> separate command-line tools that can check, infer, or fix
> > >> EditorConfig rules
> > across
> > >> one or more directories/files in a repository manually.  I think
> > >> using
> > one
> > >> of these tools such as eclint would be essential for writing a
> > >> proper .editorconfig that narrows its scope as much as possible
> > >> (e.g., we don't want to change existing bin or dat or tdml files
> > >> inadvertently when
> > editing
> > >> a single character within them in Emacs or IDEA because many of
> > >> them use other charsets and are not source code).
> > >>
> > >

Re: all this github spam ?

2021-04-21 Thread Beckerle, Mike
This has to do with crypto mining?  Gaaak.

So their PR contains crypto mining code, and they are doing this to get the CI 
to run it as part of the way CI checks any PR?

Sounds like submitting a PR has to require a Capcha or 2-FA.



From: Steve Lawrence 
Sent: Wednesday, April 21, 2021 9:22 AM
To: dev@daffodil.apache.org 
Subject: Re: all this github spam ?

Unfortunately, I'm not sure there's anything we can do about it.

GitHub doesn't give any controls over who can/can't open a PR. We can't
even temporarily close PR's completely.

We could maybe make it so GitHub actions on PRs must be manually
triggered so the spammers cryptocurrency mining stuff would never run.
But that's a bit of a pain, and it relies on the spammers to realize
their stuff isn't being run anymore and take us off their list. My guess
is we're stuck on their list forever now.

These crypto mining attacks are a known issue for GitHub, hopefully
they're working on a solution. Tough, GitHub is eventually detecting
these are spam and closing the accounts and deleting the PRS, but not
until after the PR is created.

As to the archive issue, we could maybe ask infra to remove archives
that are clearly spam (all of them so far say "Demo titles Add
files...", so unique and consistent). But it doesn't solve the
underlying issue.


On 4/21/21 8:59 AM, Beckerle, Mike wrote:
> We seem to be fending off maybe 10 a day github spam attacks where people
> open/close pull requests.
>
> Is there something systematic we can do to avoid this?
>
> This pollutes our mailing lists. I know we can manually purge the PRs from
> github, but these things will live forever in the mail archives, adding a 
> bunch
> of random emails/account names to them, and generally making them less useful.
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>
> P +1-781-330-0412
>
> Connect with us!
>
> <https://www.linkedin.com/company/owlcyberdefense/><https://twitter.com/owlcyberdefense>
>
> <https://owlcyberdefense.com/resources/events/>
>
> **
>
> The information contained in this transmission is for the personal and
> confidential use of the individual or entity to which it is addressed. If the
> reader is not the intended recipient, you are hereby notified that any review,
> dissemination, or copying of this communication is strictly prohibited. If you
> have received this transmission in error, please notify the sender immediately
>



all this github spam ?

2021-04-21 Thread Beckerle, Mike
We seem to be fending off maybe 10 a day github spam attacks where people 
open/close pull requests.

Is there something systematic we can do to avoid this?

This pollutes our mailing lists. I know we can manually purge the PRs from 
github, but these things will live forever in the mail archives, adding a bunch 
of random emails/account names to them, and generally making them less useful.

Mike Beckerle | Principal Engineer

[cid:dad50db4-6031-4eb8-bba5-24881987583e]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:c5dd57f6-59b7-42a8-8f72-3b5aec87ae58][cid:ce6f9108-6d36-4815-b1f6-8201448e03b5]

[cid:034fa20d-6e10-4ed7-8818-91fb58a12136]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Re: [Discuss] creating Release 3.1.0 and 96 JIRA tickets marked "Major" or higher

2021-04-20 Thread Beckerle, Mike
Besides those 2 bugs, I think we should also merge 
https://github.com/apache/daffodil/pull/490
which adds a new charset (EBCDIC). We should roll this forward if susmita can't 
take it up.

From: Beckerle, Mike 
Sent: Tuesday, April 20, 2021 3:49 PM
To: dev@daffodil.apache.org 
Subject: Re: [Discuss] creating Release 3.1.0 and 96 JIRA tickets marked 
"Major" or higher


Looking at the critical JIRA tickets, they are:

DAFFODIL-1422<https://issues.apache.org/jira/browse/DAFFODIL-1422> disallow 
doctype decls in all XML & XSD that we read in
DAFFODIL-2473<https://issues.apache.org/jira/browse/DAFFODIL-2473> Need 
bit-wise AND, OR, NOT, and shift operations
DAFFODIL-2183<https://issues.apache.org/jira/browse/DAFFODIL-2183> Unparse 
nilled complex element fails.
DAFFODIL-1598<https://issues.apache.org/jira/browse/DAFFODIL-1598> Unparser: 
For strings that truncate, the dfdl:valueLength function cannot suspend
DAFFODIL-2399<https://issues.apache.org/jira/browse/DAFFODIL-2399> Error 
diagnostics output even though there is an infoset
DAFFODIL-2400<https://issues.apache.org/jira/browse/DAFFODIL-2400> New SAX API 
causes performance degredations.
DAFFODIL-1971<https://issues.apache.org/jira/browse/DAFFODIL-1971> Statement 
order of evaluation not per DFDL Spec

I'm going to suggest how we handle each of these for the 3.1.0 release.

I end up with 2 bugs to fix.

Keep in mind this is discussion fodder.

DAFFODIL-1422<https://issues.apache.org/jira/browse/DAFFODIL-1422> disallow 
doctype decls in all XML & XSD that we read in

  *   release note it - try to fix for 3.2.0

DAFFODIL-2473<https://issues.apache.org/jira/browse/DAFFODIL-2473> Need 
bit-wise AND, OR, NOT, and shift operations

  *   change to priority major - we need better documentation of the use case 
driving this.

DAFFODIL-2183<https://issues.apache.org/jira/browse/DAFFODIL-2183> Unparse 
nilled complex element fails.

  *   Fix. This is actually a pretty bad bug, and some solutions to problems 
that have come up in real schemas need to use nilled complex elements in the 
solution.

DAFFODIL-1598<https://issues.apache.org/jira/browse/DAFFODIL-1598> Unparser: 
For strings that truncate, the dfdl:valueLength function cannot suspend

  *Release Note, leave open. I question if this is critical. Change 
priority?

DAFFODIL-2399<https://issues.apache.org/jira/browse/DAFFODIL-2399> Error 
diagnostics output even though there is an infoset

  *Fix. This is actually a pretty bad bug.

DAFFODIL-2400<https://issues.apache.org/jira/browse/DAFFODIL-2400> New SAX API 
causes performance degredations.

  *   Release Note. Change priority to Major. I think this should not be 
critical as SAX is a relatively new feature, and while SAX is a 
performance-oriented feature, it's not like this is a regression on current 
functionality.

DAFFODIL-1971<https://issues.apache.org/jira/browse/DAFFODIL-1971> Statement 
order of evaluation not per DFDL Spec

  *   Release Note.

Comments?


From: Interrante, John A (GE Research, US) 
Sent: Monday, April 12, 2021 2:52 PM
To: dev@daffodil.apache.org 
Subject: [Discuss] creating Release 3.1.0 and 96 JIRA tickets marked "Major" or 
higher

I found the list of open and in-progress issues sorted by priority 
(https://issues.apache.org/jira/issues/?jql=project%20%3D%20DAFFODIL%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20resolution%20%3D%20Unresolved%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC).
  I looked at all the critical issues (there are 8 in this list) and I don't 
think we have to hold up 3.1.0 for them although it would be nice if some got 
fixed first (e.g., DAFFODIL-1422 disallowing doctype decls for security and 
DAFFODIL-2400 fixing SAX API conformance and performance).

However, I would like to merge the new Runtime2 backend in time for the 3.1.0 
release of Daffodil in order to 1) accelerate development by avoiding the need 
to rebase the runtime2 branch on the master branch periodically, and 2) attract 
new developers to help me build out the runtime2 code further.  I need to 
finish a review of my outstanding pull request, merge it, rebase runtime2 on 
the master branch, submit a pull request to merge runtime2, and complete that 
PR’s review, so about 1-2 more weeks.

John

-Original Message-
From: Steve Lawrence 
Sent: Monday, April 12, 2021 1:33 PM
To: dev@daffodil.apache.org
Subject: EXT: Re: [Discuss] creating Release 3.1.0 and 96 JIRA tickets marked 
"Major" or higher

I would also like to see the updates to SAX merged--there's some important API 
conformance and performance fixes that are worth getting in. There's an open 
pull request (PR #478) that I think is close to being ready to be merged.

Of the open critical issues, I think DAFFODIL-2399 might be worth blocking if 

Give a virtual talk at APACHECON about Daffodil?

2021-04-20 Thread Beckerle, Mike
ApacheCon @ Home 2021 is Sept 21-23.  
https://www.apachecon.com/acah2021/index.html

Consider giving a presentation related to Apache Daffodil ! The call for 
presentations is open until May 
2nd Midnight EDT.US (= 05:00 UTC). You need only formulate the idea for your 
presentation and submit a proposal for your presentation by that time.

This is a virtual conference, so you will be giving the presentation over an 
online system that records it. Makes a youtube video of it, etc.

If you have ideas for a presentation that you'd like to collaborate on, let's 
discuss!


Mike Beckerle | Principal Engineer

[cid:4850e7cc-d259-4fe0-bcaa-53a5dcee8e4b]

mbecke...@owlcyberdefense.com

P +1-781-330-0412



Re: [Discuss] creating Release 3.1.0 and 96 JIRA tickets marked "Major" or higher

2021-04-20 Thread Beckerle, Mike
d like DAFFODIL- 2482 to get into it;
> https://github.com/apache/daffodil/pull/520
>
> Will increase priority on wrapping this up.
>
>
>
> On Mon, Apr 12, 2021 at 12:43 PM Beckerle, Mike <
> mbecke...@owlcyberdefense.com> wrote:
>
>> I'd like to discuss our need to create a new release of Daffodil,
>> which would be 3.1.0.
>>
>> We have added enough new functionality certainly to justify a release.
>> There are important features already complete, there is the new
>> Runtime2 backend, etc.
>>
>> The challenge is that we have 96 JIRA tickets specifically for bugs
>> that are marked "major" or above in priority.  6 are marked critical,
>> so 90 are "major". (I am excluding all "improvement" and
>> "new-feature" tickets in this count. Just bugs.) Obviously we're not
>> going to fix 96 issues super quickly.
>>
>> Some people advocate a set of criteria for releases which stipulate
>> there can be no critical/blocker issues, and no major issues, only minor 
>> issues.
>> However, the status of critical/major/minor on our JIRA tickets is
>> subjective, most bugs are found and reported by us.
>>
>> Exactly two bugs have "votes" more than zero, which reflects that
>> we've not been using the votes field to prioritize anything, but
>> perhaps we should use votes moving forward, rather than bumping
>> priorities up and down based on our subjective assessment of importance.
>>
>> I believe we need to do a release very soon regardless of these 96 issues.
>> In scrolling through them, evaluating them as "are they more
>> important than doing our first TLP release", none of them rise to
>> that level of importance to me.
>>
>> Most of these issues are part of release 3.0.0 and before that as
>> well, so
>> 3.1.0 would still be an improvement.
>>
>> One way to deal with the critical issues is to specifically discuss
>> them in a release note.
>>
>> Please let's discuss openly. What do you believe must​ be in 3.1.0,
>> that we would hold up a release over?
>>
>> -mike beckerle
>>
>>
>>
>



Re: The future of the daffodil DFDL schema debugger?

2021-04-20 Thread Beckerle, Mike
Welcome Adam,

Here's the link to Adam's book, which looks very useful.

(Not shameless self promotion if someone else sends the link )

https://essentialeffects.dev/

-mikeb



From: Adam Rosien 
Sent: Monday, April 19, 2021 11:21 AM
To: dev@daffodil.apache.org 
Subject: Re: The future of the daffodil DFDL schema debugger?

Hi everybody, I've recently started working on Daffodil with some other
folks and will be helping where I can with the debugger.

I've been writing Scala since ~2011 and recently wrote a book about Cats
Effect, which has a similar scope to ZIO (effects, concurrency, etc.). If
anybody has any questions about the approach and techniques, I'm happy to
help.

.. Adam




editconfig

2021-04-19 Thread Beckerle, Mike
https://editorconfig.org/

This is interesting and we should consider adding these files to the root of 
daffodil both as a declaration of the code-style, and a way that 
auto-configures many IDEs and tools (like github).

This does not appear to be sophisticated enough to really cover code-style 
issues at all, but at least basic whitespace stuff like spaces not tabs, 
2-space indenting, etc. would be covered.




Re: The future of the daffodil DFDL schema debugger?

2021-04-16 Thread Beckerle, Mike
's not big, but
>> every new thing that needs configuration adds complexity and decreases
>> usability.
>>
>> And I think the only reason we are trying to spend effort elliding
>> things is because we're limited to this gdb-like interface where you can
>> only print out a little information at a time.
>>
>> I think what would really is to dump this gdb interface and instead use
>> multiple windows/views. As a really close example to what I imagine, I
>> recently came across this hex editor:
>>
>> https://www.synalysis.net/
>>
>> The screenshots are a bit small so it's not super clear, but this tool
>> has one view for the data in hex, and one view for a tree of parsed
>> results (which is very similar to our infoset). The "infoset" view has
>> information like offset/length/value, and can be related back to the
>> data view to find the actual bits.
>>
>> I imagine the "next generation daffodil debugger" to look much like
>> this. As data is parsed, the infoset view fills up. This view could act
>> like a standard GUI tree so you could collapse sections or scroll around
>> to show just the parts you care about, and have search capabilities to
>> quickly jump around. The advantage here is you no longer really need
>> automated eliding or heuristics for what the user *might* care about.
>> You just show the whole thing and let user scroll around. As daffodil
>> parses and backtracks, this tree grows or shrinks.
>>
>> I also imagine you could have a cursor moving around the hex view, so as
>> daffodil moves around (e.g. scanning for delimiters, extracting
>> integers), one could update this data view to show what daffodil is
>> doing and where it is.
>>
>> I also image there could be other views as well. For example, a schema
>> view to show where in the schema daffodil is, and to add/remove
>> breakpoints. And an information view for things like variables, in-scope
>> delimiters, PoU's, etc.
>>
>> The only reason I mention a debug protcol is that would allow this GUI
>> to be more easily written in something other that Java/Scala to take
>> advantage of other GUI toolkits. It's been a long while since I've done
>> anything with Java guis, but they seems pretty poor that last I looked
>> at them. Would even allow for a TUI, which Java has little/no support
>> for. Also enables things like remote deubgging if an socket IPC was
>> used. Though I'm not sure all of that is necessary. Just thinking what
>> would be ideal, and it can always be pared back.
>>
>>
>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>> > I don't think of it as a daffodil debug protocol, but just a separation
>> of concerns between display of information and the behaviors of
>> parse/unparse that need to be points where users can pause, and data
>> structures available to display.
>> >
>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>> clumsy, too big, etc.  The infoset is available in the processor state, and
>> one can examine the current node, enclosing node, prior sibling(s),
>> following sibling(s), etc. One can elide contents that are too big for
>> hexBinary, etc.
>> >
>> > I think this problem, how to display the infoset with sensible limits
>> on sizing, is fairly easy to come up with some design for, that will at
>> least be (1) always fairly small (2) much more useful in more cases. It
>> won't be perfect but can be much better than what we do now.
>> >
>> > One sensible display "mode" should be that displaying the context
>> surrounding the current element (when parsing or unparsing) displays at
>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>> (settable within reason ?)
>> >
>> > Sibling and enclosing nodes would be displayed eliding their contents
>> to at most 1 line.
>> >
>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>> >
>> > ...
>> > 
>> >...
>> >89ab782 ...
>> >some text is here and some more text
>> >value might be some big thing which needs to be elided
>> ...
>> > ... 
>> >???
>> > 
>> > ???
>> >
>> > The  is just an idea to reduce XML matching end-tag clutter.
>> >
>> > The ... on a line alone or where element content would appear generally
>> means 1 or more other siblings. The way the display above starts with ...
>> means that this is a relative inner

[Discuss] creating Release 3.1.0 and 96 JIRA tickets marked "Major" or higher

2021-04-12 Thread Beckerle, Mike
I'd like to discuss our need to create a new release of Daffodil, which would 
be 3.1.0.

We have added enough new functionality certainly to justify a release. There 
are important features already complete, there is the new Runtime2 backend, etc.

The challenge is that we have 96 JIRA tickets specifically for bugs that are 
marked "major" or above in priority.  6 are marked critical, so 90 are "major". 
(I am excluding all "improvement" and "new-feature" tickets in this count. Just 
bugs.) Obviously we're not going to fix 96 issues super quickly.

Some people advocate a set of criteria for releases which stipulate there can 
be no critical/blocker issues, and no major issues, only minor issues. However, 
the status of critical/major/minor on our JIRA tickets is subjective, most bugs 
are found and reported by us.

Exactly two bugs have "votes" more than zero, which reflects that we've not 
been using the votes field to prioritize anything, but perhaps we should use 
votes moving forward, rather than bumping priorities up and down based on our 
subjective assessment of importance.

I believe we need to do a release very soon regardless of these 96 issues. In 
scrolling through them, evaluating them as "are they more important than doing 
our first TLP release", none of them rise to that level of importance to me.

Most of these issues are part of release 3.0.0 and before that as well, so 
3.1.0 would still be an improvement.

One way to deal with the critical issues is to specifically discuss them in a 
release note.

Please let's discuss openly. What do you believe must​ be in 3.1.0, that we 
would hold up a release over?

-mike beckerle




Re: Output SVRL from Schematron Validator

2021-04-05 Thread Beckerle, Mike
I looked at the PR for this feature. I think it's fine to have the CLI provide 
an option with a file to write it to, and API-wise, if we decide we have to 
expose this, then a parseResult.validationResult.raw member, or like that, is 
fine with me.

Do we need API-level access to this? E.g. in SAPI/JAPI? I would imagine so.

From: John Wass 
Sent: Monday, March 29, 2021 1:55 PM
To: dev@daffodil.apache.org 
Subject: Re: Output SVRL from Schematron Validator

The thought with the OutputStream was it would be dumped directly to a file
or log or stdout, definitely more of a logging effect than for more
processing, since the structured results from a validator are already
returned as ValidationResult.  That idea looks and sounds worse today that
it did initially.

> What about if each ParseResult has a member

Ah, what if the ParseResult hangs on to the ValidationResult and makes it
accessible that way?

  def validationResult(): Option[ValidationResult]

To support this ValidationResult would become a trait which lets validator
implementations attach custom data and interfaces to the result, which
clients can get to through the ParseResult accessor.

Something like this;
https://github.com/jw3/daffodil/tree/validator_result_refactor

Thoughts?


On Fri, Mar 26, 2021 at 10:30 AM Steve Lawrence 
wrote:

> What about if each ParseResult has a member that's something like
>
>   val validationData: Option[AnyRef]
>
> Each validator can optionally return some validation data which is then
> store in this member. The user could then access this validation data
> through the ParseResult and cast it to what it should be, as documented
> by the validator.
>
> This allows each validator a way provide whatever additional data they
> want in whatever format makes the most sense for them.
>
> There's the downside that a user needs to know how to cast this AnyRef
> based on which validator was used. But a similar issue exists if this is
> just an InputStream--you still need to know how to interpret that
> InputStream data. But with this approach, it lets a Validator return
> complex structures that can provide richer information than an
> InputStream could.
>
> On 3/26/21 10:16 AM, John Wass wrote:
> > Reference implementation here
> > https://github.com/jw3/daffodil/tree/validator_outputstream
> >
> > Currently has changes sketched in from the parse result on down.  Need to
> > wire things in through DP and CLI yet.
> >
> > Haven't thought of an alternative that works yet.
> >
> >
> > On Tue, Mar 23, 2021 at 12:59 PM John Wass  wrote:
> >
> >> Looking at DAFFODIL-2482 that came up due to a gap that's blocking
> >> integration of the schematron validation functionality into some
> workflows
> >> that require the full SVRL output, not just the pass/fail status.
> >>
> >> So what needs to happen here is the SVRL that we currently just parse
> for
> >> errors and discard needs to be output in a predictable way. I've tried a
> >> couple things intent on minimizing the footprint of the impl but coming
> up
> >> empty mainly due to violating the reusable validator principle.
> >>
> >> So another unminimized approach would be to provide an additional stream
> >> to all validators for raw output to be written, the implementation of
> that
> >> stream is determined by configuration from the DataProcessor.  The new
> >> output stream is passed at validation-time, which requires changing the
> >> signature of the validate call to accept this output stream in addition
> to
> >> the existing input stream (or we could add another interface, but I'm
> not
> >> convinced of the usefulness of that currently).
> >>
> >> Looking for some thoughts on this approach.
> >>
> >>
> >> [1] https://issues.apache.org/jira/browse/DAFFODIL-2482
> >>
> >>
> >
>
>


Re: XML String in Binary Data Question

2021-04-05 Thread Beckerle, Mike
I will create the test case as you suggest, illustrating the whole situation 
and what Daffodil does today.

What I'm seeking is a way for the string bar to be rendered as a 
string as exactly those characters, so that we *fool* a subsequent XML 
validator into treating the string contents as a tree of well-formed XML 
elements.
An XML schema for the resulting data would not have type xs:string for the 
myString element, but a complex type containing a "foo" child element. XPaths 
like myString/foo would be meaningful in this data.

Arguably, DFDL should not do this, rather, a post-processor of the XML-rendered 
infoset should do this XML-specific transformation.

The analogous situation does also occur for JSON. (Though nobody has asked for 
this as yet.)

The string { "foo" : "bar" } as a string value of a JSON field named "myString" 
would require a bunch of escaping. E.g., perhaps (I don't know JSON so well) 
like

"myString" : "\{ \"foo\": \"bar\" \""

This will be interesting to test.



From: Interrante, John A (GE Research, US) 
Sent: Monday, April 5, 2021 7:36 AM
To: dev@daffodil.apache.org 
Subject: RE: XML String in Binary Data Question

I was waiting for someone to offer an opinion but it seems it's up to me.  
First of all, please write an actual test case of binary data with a 
well-formed piece of XML data inside a string.  Please round trip it through 
Daffodil so we can actually find out how well both the parser and the unparser 
handle this data.  I find it hard to believe Daffodil doesn't already use some 
escaping or quoting mechanism to handle this kind of situation where the 
infoset (represented as XML) contains an element whose body looks like 
well-formed XML elements in their own turn.

Even if this situation causes the Daffodil unparser to explode, what's to stop 
you from telling Daffodil to represent the infoset as JSON rather than XML?  
Surely the Daffodil unparser wouldn't have a problem unparsing the JSON 
representation with XML elements inside a string element?

I also would be curious to find out whether the infoset's JSON representation 
has a similar problem handling an actual test case of binary data with a 
well-formed piece of JSON data inside a string.

Once we know what really happens (and we also can run the same JSON/XML test 
cases through IBM's Daffodil processor to get more data points), we can start 
to discuss what's the best solution to handle this kind of situation for both 
JSON and XML infoset representations automatically.

John

-Original Message-
From: Beckerle, Mike 
Sent: Friday, April 2, 2021 12:50 PM
To: dev@daffodil.apache.org
Subject: EXT: XML String in Binary Data Question

I've started running into binary data containing XML strings.

If Daffodil is unparsing a piece of XML Like this:

of arbitrary xml

Suppose the DFDL schema for bodyString is:



So the notion here is that the data contains a string, which is a well-formed 
piece of XML.
For example, the overall format may be binary data that just happens to contain 
this string of XML in it.

I suspect that the Daffodil unparser is just going to explode on this, because 
it will be fed element events for the string contents. I.e., the unparsing 
converts the incoming XML text to infoset events by first parsing it as XML, 
and that process is schema-unaware, so has no notion that the XML parse should 
NOT parse the parts of the body string as XML elements.

Does it make sense for Daffodil's XML-text infoset importer (used by unparsing) 
to recognize this case, and convert the of arbitrary 
xml into an escapified XML string like:

ns:well formed=pieceof arbitrary xml/ns:well

and then unparse it as if that string had arrived as this XML event to the 
unparser/XML-text Infoset inputter:

ns:well formed=pieceof arbitrary 
xml/ns:well

So would an option to have this behavior be a reasonable thing to add to 
Daffodil?

The corresponding parse feature would be to emit the string not as escapified 
XML, but just as a string of text of well-formed XML.

I guess the notion is that escapifying strings is because the string contents 
may not be well-formed XML, but in this case since they ARE well formed pieces 
of XML, when a string is required we can emit unescapified XML, and also 
consume the same for unparsing and convert into strings.

Thoughts?



XML String in Binary Data Question

2021-04-02 Thread Beckerle, Mike
I've started running into binary data containing XML strings.

If Daffodil is unparsing a piece of XML Like this:

of arbitrary xml

Suppose the DFDL schema for bodyString is:



So the notion here is that the data contains a string, which is a well-formed 
piece of XML.
For example, the overall format may be binary data that just happens to contain 
this string of XML in it.

I suspect that the Daffodil unparser is just going to explode on this, because 
it will
be fed element events for the string contents. I.e., the unparsing converts the 
incoming XML text to infoset events by first parsing it as XML, and that 
process is schema-unaware, so has no notion that the XML parse should NOT parse 
the parts of the body string as XML elements.

Does it make sense for Daffodil's XML-text infoset importer (used by unparsing) 
to recognize this case, and convert the of arbitrary 
xml into an escapified XML string like:

ns:well formed=pieceof arbitrary xml/ns:well

and then unparse it as if that string had arrived as this XML event to the 
unparser/XML-text Infoset inputter:

ns:well formed=pieceof arbitrary 
xml/ns:well

So would an option to have this behavior be a reasonable thing to add to 
Daffodil?

The corresponding parse feature would be to emit the string not as escapified 
XML, but just as a string of text of well-formed XML.

I guess the notion is that escapifying strings is because the string contents 
may not be well-formed XML, but in this case since they ARE well formed pieces 
of XML, when a string is required we can emit unescapified XML, and also 
consume the same for unparsing and convert into strings.

Thoughts?



Re: Acceptance criteria for merging DFDL-to-C backend (runtime2-2202)?

2021-03-11 Thread Beckerle, Mike
Setup of the C-toolchain as well as scala shoulnd't be a problem so long as 
these are all no-cost tools.

From: Interrante, John A (GE Research, US) 
Sent: Thursday, March 11, 2021 11:34 AM
To: dev@daffodil.apache.org 
Subject: RE: Acceptance criteria for merging DFDL-to-C backend (runtime2-2202)?

Is that all?  :-).  I would add some criteria testing runtime2's conformance to 
the DFDL 1.0 specification as well.  Here goes...

1) Sufficient functionality to describe at least one example, of sufficient 
message complexity to indicate that other similar "real" examples should be 
possible

Yup.  We have 7 schemas in 
daffodil-test/src/test/resources/org/apache/daffodil/runtime2 that have "real" 
examples of messages with sufficient complexity to indicate that other real 
messages should be possible.

2) contains built-in tests for one or several such examples, showing each 
supported aspect working.

Yup.  We have TDML tests and binary data/XML infoset files for each of 
these 7 schemas.

3) the tests are fully integrated - they run every time 'sbt test' is run on 
daffodil.

Yup.  We have Scala test classes for each of these 7 schemas' TDML 
tests in daffodil-test/src/test/scala/org/apache/daffodil/runtime2.

4) setup instructions for developers are there so that people know how to 
insure the required C tool-chain elements are there. These need to work on 
Linux and Windows.

Needs work.  The top-level README lists daffodil-runtime2's build 
requirements under Build Requirements and has a single sentence under Getting 
Started telling developers that they will need a C toolchain in order to build 
daffodil-runtime2.  However, I didn't make it clear in the README that 
developers must setup a C toolchain as well as a Java/Scala toolchain and I 
need to add a "C Setup and Notes" page to the Confluence Wiki with step-by-step 
instructions showing developers how to setup a C toolchain on Linux and 
Windows.  I hope developers won't mind if we make it mandatory to setup both 
Java/Scala and C toolchains before building Daffodil.  I don't like the idea of 
some developers building Daffodil without checking that the C files compile 
successfully and the runtime2 tests pass successfully.  I think it's sufficient 
to guarantee that end-users of Daffodil never have to install a C toolchain to 
use Daffodil.  When end-users install Daffodil, they will get a pure Scala 
Daffodil just like before.  The only user-visible change is that this Daffodil 
will know how to generate, compile, and run C code on the end-user's system if 
the user asks Daffodil to do that.

I also would add another criteria:

5) A subset of Daffodil's DFDL 1.0 specification conformance TDML tests 
modified to run on runtime2 as well as runtime1 during 'sbt test'.

We have lots and lots of TDML tests checking that Daffodil conforms to the DFDL 
1.0 specification.  We should be able to find a subset of these tests that 
should pass on runtime2 as well and verify that they do keep passing when run 
on both runtime1 and runtime2.  This step would go a long way to ensure that 
people working on other aspects of Daffodil do not break the C backend 
inadvertently without knowing it.  Note that this makes it even more mandatory 
that Daffodil developers setup both Java/Scala and C toolchains.  What do you 
all think?


-----Original Message-
From: Beckerle, Mike 
Sent: Thursday, March 11, 2021 9:27 AM
To: dev@daffodil.apache.org
Subject: EXT: Re: Acceptance criteria for merging DFDL-to-C backend 
(runtime2-2202)?

Acceptance criteria to me:

1) Sufficient functionality to describe at least one example, of sufficient 
message complexity to indicate that other similar "real" examples should be 
possible

2) contains built-in tests for one or several such examples, showing each 
supported aspect working.

3) the tests are fully integrated - they run every time 'sbt test' is run on 
daffodil.

4) setup instructions for developers are there so that people know how to 
insure the required C tool-chain elements are there. These need to work on 
Linux and Windows.

I believe these things insure that people working on other aspects of Daffodil 
will not be inadvertently breaking the C backend indetectably. That's one major 
concern I would have.

From: Interrante, John A (GE Research, US) 
Sent: Wednesday, March 10, 2021 4:24 PM
To: dev@daffodil.apache.org 
Subject: Acceptance criteria for merging DFDL-to-C backend (runtime2-2202)?

Thanks to Mike's suggestion, I have moved the Runtime2 ToDos (changes requested 
by reviewers) from the 
DAFFODIL-2202<https://issues.apache.org/jira/browse/DAFFODIL-2202> issue to an 
AsciiDoc document in the dev/design-notes subtree of the Daffodil website which 
you can read here:

https://daffodil.apache.org/dev/design-notes/runtime2-todos/

I would like to discuss which acceptance 

Re: Acceptance criteria for merging DFDL-to-C backend (runtime2-2202)?

2021-03-11 Thread Beckerle, Mike
Acceptance criteria to me:

1) Sufficient functionality to describe at least one example, of sufficient 
message complexity to indicate that other similar "real" examples should be 
possible

2) contains built-in tests for one or several such examples, showing each 
supported aspect working.

3) the tests are fully integrated - they run every time 'sbt test' is run on 
daffodil.

4) setup instructions for developers are there so that people know how to 
insure the required C tool-chain elements are there. These need to work on 
Linux and Windows.

I believe these things insure that people working on other aspects of Daffodil 
will not be inadvertently breaking the C backend indetectably. That's one major 
concern I would have.

From: Interrante, John A (GE Research, US) 
Sent: Wednesday, March 10, 2021 4:24 PM
To: dev@daffodil.apache.org 
Subject: Acceptance criteria for merging DFDL-to-C backend (runtime2-2202)?

Thanks to Mike's suggestion, I have moved the Runtime2 ToDos (changes requested 
by reviewers) from the 
DAFFODIL-2202 issue to an 
AsciiDoc document in the dev/design-notes subtree of the Daffodil website which 
you can read here:

https://daffodil.apache.org/dev/design-notes/runtime2-todos/

I would like to discuss which acceptance criteria the runtime2-2202 development 
branch must meet before I can submit a pull request to merge the DFDL-to-C 
backend and code generator into the main branch.  I plan to address the 
Runtime2 ToDos and I especially want to run some of Daffodil's TDML tests on 
the new runtime2 backend as well as the runtime1 backend by adding 
defaultImplementations="daffodil daffodil-runtime2" to certain TDML tests' 
attributes.  (Although I suggest we use the shorter name "daf-c" or "runtime2" 
because "daffodil daffodil-runtime2" is a lot of characters to put into the 
defaultImplementations attribute.)

Daffodil's Confluence describes Runtime2's design here:

https://cwiki.apache.org/confluence/display/DAFFODIL/WIP:+Daffodil+Runtime+2

In particular, it suggests we divide the implementation of runtime2 into two 
distinct phases:

  *   Phase 1 (aka Runtime2P1): No expressions. All lengths are fixed. All 
arrays have fixed length.
  *   Phase 2 (aka Runtime2P2): Adding the DFDL expression language, lengthKind 
'explicit', occursCountKind 'expression'.
I think phase 1 is almost done but we need to run a subset of Daffodil's TDML 
tests on runtime2 before we can really say for sure.
Here is an initial set of discussion points - more questions and criteria are 
welcome:


  1.  Which Daffodil TDML tests do we need to run on runtime2 to assert that 
phase 1 is complete?
  2.  Can we merge runtime2 when these tests pass and then build out phase 2 in 
the main branch, hopefully with help from other developers once they see how 
useful phase 1 is?
  3.  Which of the Runtime2 ToDos need to be done before the merge as well?

Once we agree on a minimal set of acceptance criteria for the merge, I'll copy 
the criteria to the JIRA issue.


Re: incompatible change in 3.1.0-SNAPSHOT

2021-03-05 Thread Beckerle, Mike
Nevermind.

This was my change to make the TDML runner for negative tests compatible with 
cross-testing rigs like the IBM TDML cross tester.

IBM TDML can't express "just get the root element from the file".

So this is a fundamental inconsistency.

Fortunately it comes up in only 1 production schema I have seen.
____
From: Beckerle, Mike
Sent: Friday, March 5, 2021 5:37 PM
To: dev@daffodil.apache.org 
Subject: incompatible change in 3.1.0-SNAPSHOT

Does anyone know why TDML negative tests have to specify root now?

Did we change something so that this is now required?

There are test suites for schemas that depend on the fact that the root is the 
first element of the first schema file, so need not be specified by the TDML 
file.

These tests are all failing on 3.1.0-SNAPSHOT now.

Mike Beckerle | Principal Engineer

[cid:275f5241-e578-4775-af39-b5ef1690cd42]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>

P +1-781-330-0412

Connect with us!

[cid:65af0322-ed3e-4884-8b9b-ff014ce9113c]<https://www.linkedin.com/company/owlcyberdefense/>[cid:ba04f88d-5848-4c56-9144-41b15de5c7f3]<https://twitter.com/owlcyberdefense>

[cid:c686a14b-e5d9-42a1-b859-51df2f16589b]<https://owlcyberdefense.com/resources/events/>



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


incompatible change in 3.1.0-SNAPSHOT

2021-03-05 Thread Beckerle, Mike
Does anyone know why TDML negative tests have to specify root now?

Did we change something so that this is now required?

There are test suites for schemas that depend on the fact that the root is the 
first element of the first schema file, so need not be specified by the TDML 
file.

These tests are all failing on 3.1.0-SNAPSHOT now.

Mike Beckerle | Principal Engineer

[cid:275f5241-e578-4775-af39-b5ef1690cd42]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:65af0322-ed3e-4884-8b9b-ff014ce9113c][cid:ba04f88d-5848-4c56-9144-41b15de5c7f3]

[cid:c686a14b-e5d9-42a1-b859-51df2f16589b]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Re: Scala Steward for dependency updates?

2021-03-04 Thread Beckerle, Mike
There's been this increase of software supply chain hacks these days.

Since this bot is on our ongoing development branch, and so long as we watch 
these changes and verify the dependencies it chooses before merging these PRs, 
then this makes this version updating an incremental effort done as things come 
along. We would inspect and merge the pull requests individually. That's far 
better than the hated job to do this in bulk and verify just before a release.  
I like the notion that these updates occur as early as possible, so developers 
get experience with the new versions over time.

I very much like that they're one library at a time per pull request. Bite 
sized unit of work.




From: Steve Lawrence 
Sent: Thursday, March 4, 2021 1:10 PM
To: dev@daffodil.apache.org 
Subject: Scala Steward for dependency updates?

I just stumbled across Scala Steward [1]. From their website, "Scala
Steward is a bot that helps you keep library dependencies and sbt
plugins up-to-date."

This bot periodical checks to see if there are any newer versions of
dependencies, and if detected will create a pull request to update that
dependency in the project/Dependencies.scala file.

I've enabled it on my fork as a test, and it just created a bunch of
pull requests, so you can see what it looks like at my fork:

  https://github.com/stevedlawrence/daffodil/pulls

The benefit here is we can rely on this bot to keep our deps updated so
we don't fall behind, and can rely on our GitHub actions to test if
anything breaks for a particular dependency. So much of the process
becomes automated.

Some parts are still manual, like checking that the license for the
dependency hasn't changed, and updating the bin.NOTICE file which
mentions library  versions, so there's still some work. But it it at
least automates part of the process.

It also has a config file if needed to do things like pin certain
dependencies to a version if needed, configure pull request messages,
etc. My fork above just uses the default configuration.

If we do want to enable this, all we need to do is create a pull request
to add "apache/daffodil" to to scala steward repo's file--pretty simple
change.


Thoughts?

[1] https://github.com/scala-steward-org/scala-steward


Press release: Apache Daffodil graduating to TLP and also DFDL v1.0 Spec finalized

2021-03-04 Thread Beckerle, Mike
It's official and announced!

Apache Daffodil is now TLP. Read more, including nice user testimonials at this 
link.

 - GlobeNewswire 
http://www.globenewswire.com/news-release/2021/03/04/2187212/0/en/The-Apache-Software-Foundation-Announces-Apache-Daffodil-as-a-Top-Level-Project.html

Or same but direct from ASF site: 
https://s.apache.org/18vo



But wait, there's more!.


At the same time, the Open Grid Forum has finalized the DFDL v1.0 specification 
as a full "Recommendation", which is what OGF calls a final standard. The OGF 
just sends out a calm announcement to their wg-...@ogf.org mailing list, but 
you can tell by following this link, and looking for the letters "REC" next to 
the DFDL spec (which is at the top of the list).


https://www.ogf.org/ogf/doku.php/documents/documents


the actual official document is this one: 
http://www.ogf.org/documents/GFD.240.pdf


an HTML version of the final spec is here: 
https://opengridforum.github.io/DFDL/current/gwdrp-dfdl-v1.0.8-GFD-R-P.240.htm




Fw: new PCAP without the self-defined variables

2021-03-03 Thread Beckerle, Mike
This discussion should be on dev@daffodil list. Or it is now for posterity.


From: Beckerle, Mike 
Sent: Wednesday, March 3, 2021 10:25 AM
To: Lawrence, Stephen ; Adams, Joshua 

Subject: Re: new PCAP without the self-defined variables

I don't think an SDE is required. It's a nice to have maybe.

Given a schema with choices and dispatching, and backtracking, etc. I can 
imagine there are situations  where it would not be possible to rule out static 
cycles that can't or shouldn't occur depending on how the choices go.

I.e., if the schema compiler said any possibility of a cycle is an SDE, that 
could rule out some legitimate kinds of usage.

So I think a processing error that detects the cycle is probably fine.

From: Lawrence, Stephen 
Sent: Wednesday, March 3, 2021 9:36 AM
To: Beckerle, Mike ; Adams, Joshua 

Subject: Re: new PCAP without the self-defined variables

Does this mean that defaultValue and setVariable expressions in NVI that
reference itself should be considered an SDE? I imagine right now this
just triggers a "variable read with no value" error?

Also, reading the spec, I noticed this line about newVarialbeInstance:

> If the instance is not assigned a new default value then it inherits
> the default value specified by dfdl:defineVariable or externally
> provided by the DFDL processor.

Do we implement this logic? I don't recall seeing it when doing code
reviews.


On 3/2/21 3:48 PM, Beckerle, Mike wrote:
> I recast PCAP so that there are two variables priorRemainingDottedAddress, and
> remainingDottedAddress.
>
> I think this will eliminate your need for the "hack".
>
> This also enabled massively cleaning up the schema.
>
> There's now a parameter DFDL variable named ipAddressElement (maybe not a 
> great
> name), but it serves as a parameter to a shared
> group which is a subroutine for parsing the 1.2.3.4 type ip addresses when
> unparsing.
>
> Before this "code" was duplicated for IPSrc and IPDest. Now they that common
> group is sharable, parameterilzed by
> the one variable.
>
> This compiles, but of course deadlocks when run on my daffodil 3.1.0-SNAPSHOT
> currently.
>
> I've pushed it to my branch pcap-9-nvi, so it should show up as part of the 
> same
> Pull Request you are working from.
>
>
> Mike Beckerle | Principal Engineer
>
> mbecke...@owlcyberdefense.com <mailto:bhum...@owlcyberdefense.com>
>
> P +1-781-330-0412
>
> Connect with us!
>
> <https://www.linkedin.com/company/owlcyberdefense/><https://twitter.com/owlcyberdefense>
>
> <https://owlcyberdefense.com/resources/events/>
>
> **
>
> The information contained in this transmission is for the personal and
> confidential use of the individual or entity to which it is addressed. If the
> reader is not the intended recipient, you are hereby notified that any review,
> dissemination, or copying of this communication is strictly prohibited. If you
> have received this transmission in error, please notify the sender immediately
>



Re: Maven Central broken source code links for all versions of Daffodil

2021-03-01 Thread Beckerle, Mike
Thanks Chris,

Ok, so nothing to do here then.

My next thought then, is that we need to cut a new release soon, so that we 
have the most recent release without any incubator artifacts in it.  I'll raise 
that separately.

From: Christofer Dutz 
Sent: Monday, March 1, 2021 4:55 PM
To: dev@daffodil.apache.org 
Subject: AW: Maven Central broken source code links for all versions of Daffodil

Hi Mike,

I hope I didn't mis-understand you, but:
I think nothing has to be changed. The artifacts released before incubation all 
have the name "incubator-" or "incubating-" in them. This is not changed after 
graduation. The first release you do after graduation will not have that 
prefix. Actually removing the "incubating" from the artifacts in maven central, 
would lead people to think this was a top-level-release where all things are 
expected to be good from a legal point of view, while incubating releases might 
not be as perfect.

Chris

Von: Beckerle, Mike 
Gesendet: Montag, 1. März 2021 22:41
An: dev@daffodil.apache.org
Betreff: Maven Central broken source code links for all versions of Daffodil

So since we are now a TLP, URLs like those on Maven Central that say where our 
source is, they're all broken now, since our repository no longer has 
"incubator-" in its name. (e.g., 
https://gitbox.apache.org/repos/asf/incubator-daffodil.git is used on Maven 
central for our 3.0.0 artifacts, and all prior releases have the same issue.)

I checked with INFRA, and they don't normally fix this by forwarding those 
URLs. (https://issues.apache.org/jira/browse/INFRA-21438)

Does anyone know if this aspect of the Maven Central database can be somehow 
updated without having to re-release everything?

The broken link seems to come out of the XML description of the artifact. I 
assume this artifact is generated from our release process.

Mike Beckerle | Principal Engineer

[cid:557ed22c-e5b0-4d68-94ea-91e775d4ebf4]

mbecke...@owlcyberdefense.com<mailto:bhum...@owlcyberdefense.com>
P +1-781-330-0412

Connect with us!

[cid:003c8262-2d35-4076-a3e4-7fe5609caaa4]<https://www.linkedin.com/company/owlcyberdefense/>[cid:859563ee-7bd9-4490-a2c9-c4f90fd85d8b]<https://twitter.com/owlcyberdefense>

[cid:e87a88f8-bd26-4362-900b-6c19bada69e5]<https://owlcyberdefense.com/resources/events/>



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Maven Central broken source code links for all versions of Daffodil

2021-03-01 Thread Beckerle, Mike
So since we are now a TLP, URLs like those on Maven Central that say where our 
source is, they're all broken now, since our repository no longer has 
"incubator-" in its name. (e.g., 
https://gitbox.apache.org/repos/asf/incubator-daffodil.git is used on Maven 
central for our 3.0.0 artifacts, and all prior releases have the same issue.)

I checked with INFRA, and they don't normally fix this by forwarding those 
URLs. (https://issues.apache.org/jira/browse/INFRA-21438)

Does anyone know if this aspect of the Maven Central database can be somehow 
updated without having to re-release everything?

The broken link seems to come out of the XML description of the artifact. I 
assume this artifact is generated from our release process.

Mike Beckerle | Principal Engineer

[cid:557ed22c-e5b0-4d68-94ea-91e775d4ebf4]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:003c8262-2d35-4076-a3e4-7fe5609caaa4][cid:859563ee-7bd9-4490-a2c9-c4f90fd85d8b]

[cid:e87a88f8-bd26-4362-900b-6c19bada69e5]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Re: New Repository for Daffodil Schema Template?

2021-02-25 Thread Beckerle, Mike
I asked INFRA because I saw that the basic tooling doesn't support it.

If it's not something easy for them I suggest we should not bother with this, 
just put it on github/OpenDFDL with ASL license as now.



From: Steve Lawrence 
Sent: Thursday, February 25, 2021 1:12 PM
To: dev@daffodil.apache.org 
Subject: Re: New Repository for Daffodil Schema Template?

Is it possible that the github mirror the .git extension is removed? For
example, our daffodil repo isn't daffodil.git. We don't really care what
the gitbox repo name is.

I'd also like to give a chance for others to provide input before we
bother infra with this in case a better name is determined, or if
there's a valid objection.

On 2/25/21 12:58 PM, Beckerle, Mike wrote:
> I have an INFRA ticket requesting this. We'll see if they can do this for us.
>
> https://issues.apache.org/jira/browse/INFRA-21478
> ____
> From: Beckerle, Mike 
> Sent: Thursday, February 25, 2021 12:54 PM
> To: dev@daffodil.apache.org 
> Subject: Re: New Repository for Daffodil Schema Template?
>
> To get a repository created with this ".g8" extension will require an INFRA 
> ticket.
> The GUI for creating a new Apache repo just takes a name like "schema" and 
> creates a repo named "daffodil-schema.git" from it.
> 
> From: Steve Lawrence 
> Sent: Thursday, February 25, 2021 12:44 PM
> To: dev@daffodil.apache.org 
> Subject: New Repository for Daffodil Schema Template?
>
>
> The below relates to DAFFODIL-2144:
>
>   https://issues.apache.org/jira/browse/DAFFODIL-2144
>
> On github.com/OpenDFDL, there is a repository called
> "dfdl-project-layout.g8". This repository is a Giter8 [1] template repo
> that makes it easier for users to create a new schema that follows the
> standard project layout for developing schemas with Daffodil, as
> described here:
>
>   https://daffodil.apache.org/dfdl-layout/
>
> With this template repo, users can run the following command to generate
> a standard project layout for a new format:
>
>   sbt new OpenDFDL/dfdl-project-layout.g8
>
> This will ask a couple questions and then generate files/dirs based on
> the template in the repo. This is very convenient for users to quickly
> start with Daffodil schema development.
>
> However, since this project is so closely related to Daffodil, and since
> it can all be contributed as ALv2, that I think it makes more sense to
> move the template to a repository on ASF infrastructure. A Giter8
> template repo has some requirements though:
>
> 1) It be in its own repository
> 2) The repository name ends .g8
> 3) The repository is hosted on GitHub
>
> Requirements 1 and 2 mean we cannot use the existing daffodil repo, so a
> new repo is required. We can very easily create a new repo on Apache
> infrastructure with GitBox to meet those requirments, and  GitBox will
> mirror it to GitHub to meet the third requirement.
>
> So questions are:
>
> 1) Are there any objections to moving this to Apache infrastructure?
>
> 2) Assuming no objections, what should the name of this new repo be? I
> was thinking something simple like "daffodil-schema.g8" might be a good
> candidate. It's short and easy to remember, and it reads nicely when
> considering that the command to use it would look like
>
>   sbt new apache/daffodil-schema.g8
>
> But I'm open to other suggestions.
>
> Thanks,
> - Steve
>
> [1] http://www.foundweekends.org/giter8/
>



Re: New Repository for Daffodil Schema Template?

2021-02-25 Thread Beckerle, Mike
I have an INFRA ticket requesting this. We'll see if they can do this for us.

https://issues.apache.org/jira/browse/INFRA-21478

From: Beckerle, Mike 
Sent: Thursday, February 25, 2021 12:54 PM
To: dev@daffodil.apache.org 
Subject: Re: New Repository for Daffodil Schema Template?

To get a repository created with this ".g8" extension will require an INFRA 
ticket.
The GUI for creating a new Apache repo just takes a name like "schema" and 
creates a repo named "daffodil-schema.git" from it.

From: Steve Lawrence 
Sent: Thursday, February 25, 2021 12:44 PM
To: dev@daffodil.apache.org 
Subject: New Repository for Daffodil Schema Template?


The below relates to DAFFODIL-2144:

  https://issues.apache.org/jira/browse/DAFFODIL-2144

On github.com/OpenDFDL, there is a repository called
"dfdl-project-layout.g8". This repository is a Giter8 [1] template repo
that makes it easier for users to create a new schema that follows the
standard project layout for developing schemas with Daffodil, as
described here:

  https://daffodil.apache.org/dfdl-layout/

With this template repo, users can run the following command to generate
a standard project layout for a new format:

  sbt new OpenDFDL/dfdl-project-layout.g8

This will ask a couple questions and then generate files/dirs based on
the template in the repo. This is very convenient for users to quickly
start with Daffodil schema development.

However, since this project is so closely related to Daffodil, and since
it can all be contributed as ALv2, that I think it makes more sense to
move the template to a repository on ASF infrastructure. A Giter8
template repo has some requirements though:

1) It be in its own repository
2) The repository name ends .g8
3) The repository is hosted on GitHub

Requirements 1 and 2 mean we cannot use the existing daffodil repo, so a
new repo is required. We can very easily create a new repo on Apache
infrastructure with GitBox to meet those requirments, and  GitBox will
mirror it to GitHub to meet the third requirement.

So questions are:

1) Are there any objections to moving this to Apache infrastructure?

2) Assuming no objections, what should the name of this new repo be? I
was thinking something simple like "daffodil-schema.g8" might be a good
candidate. It's short and easy to remember, and it reads nicely when
considering that the command to use it would look like

  sbt new apache/daffodil-schema.g8

But I'm open to other suggestions.

Thanks,
- Steve

[1] http://www.foundweekends.org/giter8/


Re: New Repository for Daffodil Schema Template?

2021-02-25 Thread Beckerle, Mike
To get a repository created with this ".g8" extension will require an INFRA 
ticket.
The GUI for creating a new Apache repo just takes a name like "schema" and 
creates a repo named "daffodil-schema.git" from it.

From: Steve Lawrence 
Sent: Thursday, February 25, 2021 12:44 PM
To: dev@daffodil.apache.org 
Subject: New Repository for Daffodil Schema Template?


The below relates to DAFFODIL-2144:

  https://issues.apache.org/jira/browse/DAFFODIL-2144

On github.com/OpenDFDL, there is a repository called
"dfdl-project-layout.g8". This repository is a Giter8 [1] template repo
that makes it easier for users to create a new schema that follows the
standard project layout for developing schemas with Daffodil, as
described here:

  https://daffodil.apache.org/dfdl-layout/

With this template repo, users can run the following command to generate
a standard project layout for a new format:

  sbt new OpenDFDL/dfdl-project-layout.g8

This will ask a couple questions and then generate files/dirs based on
the template in the repo. This is very convenient for users to quickly
start with Daffodil schema development.

However, since this project is so closely related to Daffodil, and since
it can all be contributed as ALv2, that I think it makes more sense to
move the template to a repository on ASF infrastructure. A Giter8
template repo has some requirements though:

1) It be in its own repository
2) The repository name ends .g8
3) The repository is hosted on GitHub

Requirements 1 and 2 mean we cannot use the existing daffodil repo, so a
new repo is required. We can very easily create a new repo on Apache
infrastructure with GitBox to meet those requirments, and  GitBox will
mirror it to GitHub to meet the third requirement.

So questions are:

1) Are there any objections to moving this to Apache infrastructure?

2) Assuming no objections, what should the name of this new repo be? I
was thinking something simple like "daffodil-schema.g8" might be a good
candidate. It's short and easy to remember, and it reads nicely when
considering that the command to use it would look like

  sbt new apache/daffodil-schema.g8

But I'm open to other suggestions.

Thanks,
- Steve

[1] http://www.foundweekends.org/giter8/


Infrastructure cutover to non-incubator - please verify

2021-02-19 Thread Beckerle, Mike
This cutover looks complete to me.

I've just finished removing "incubating-" prefixes from a few last places on 
the wiki.

You can now rename forks and change the git remotes on your local clones of 
daffodil and daffodil-site.

Please email this list if anything is amiss.

Mike Beckerle | Principal Engineer

[cid:51cd0425-c494-4759-87ef-777f1275a4f9]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:66734bd9-f3d4-4132-b5b9-89495b642950][cid:3e3ce797-3a07-47ee-8ccd-67ef9329d7b5]

[cid:c2afa576-d905-443d-b1e4-0b1aab7b7cd8]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


re: TLP graduation - official announcement will be forthcoming

2021-02-18 Thread Beckerle, Mike
Note that the ASF will issue a press release about our graduating to a TLP, so 
you may want to wait until that comes out before sending notifications about 
this or posting it to your linked in, or whatever. It's not like this is a 
secret or anything, it's just that we want to the announcement to be noticed, 
so we don't want to spread the word too much until the official announcement.

Mike Beckerle | Principal Engineer

[cid:5bceaed4-95a6-4aa0-992d-7a6139f64458]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:6cb89946-4ba3-4fd4-a9bf-c29fc41d8163][cid:ec0f7de4-feff-4f4b-8f99-1b9b2d1e7411]

[cid:94781152-305c-46e8-aa52-7acc4dc8ce28]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Infrastructure Transition from incubator to TLP

2021-02-18 Thread Beckerle, Mike
The infra ticket for migrating our project infrastructure from incubator to TLP 
is

https://issues.apache.org/jira/browse/INFRA-21438

Which you may want to watch. (Don't have to. I'm watching it.)

They will do these things supposedly for us:

DNS entry, Unix/LDAP group creation, PMC Chair karma, mailing list migration, 
native Git repository migrations (but not git-svn mirrors), Subversion public 
tree migration, buildbot config changes, and website migration.

(Some of these are done already)

For anything else, or anything goes wrong, we create sub-tickets of the above 
INFRA ticket for those tasks.

As for what you will need to do locally...

At some point, all our git remote locations of local git daffodil and 
daffodil-site will change because the "incubator" part of the name will be 
dropped.

We all have personal forks of daffodil and daffodil-site, and you will need to 
create new forks of the renamed daffodil and daffodil-site repositories once 
those are established, and then change your remotes for your local git repos to 
use the new fork locations.

I am not sure how quickly this will move forward.
There's a chance this might not move forward until the minutes of the ASF board 
meeting from yesterday are published. That may be what triggers this process to 
proceed.

In the mean time, I believe the content of the daffodil-site repo can be 
changed to remove the incubator banners and such, and any changes to the 
primary daffodil git repo contents to remove "incubator" as well.

And in our communications about Daffodil, we no longer have to qualify it as 
"incubator" any longer as of now.







avoiding redundant comment and commit messages

2021-02-03 Thread Beckerle, Mike
I seem to get two of everything.

I get one message from "GitBox" sent to commits@daffodil
I get one from "notificati...@github.com" to 
incubator-daffo...@noreply.github.com

I cannot guarantee that every message always shows up to both of these.

If I unsubscribe from the latter, will I miss some things, or is all the same 
traffic always sent to both.

Mike Beckerle | Principal Engineer

[cid:04277154-4ad9-443f-9337-264ca394c4fd]

mbecke...@owlcyberdefense.com

P +1-781-330-0412

Connect with us!

[cid:228ec6e0-9ffc-4298-a483-6e2540bd4d8a][cid:0d099090-c20d-420b-bd2f-f2b1f7e0674b]

[cid:5ed4975c-9c6e-4f98-b4e6-7a957ff7757a]



The information contained in this transmission is for the personal and 
confidential use of the individual or entity to which it is addressed. If the 
reader is not the intended recipient, you are hereby notified that any review, 
dissemination, or copying of this communication is strictly prohibited. If you 
have received this transmission in error, please notify the sender immediately


Re: Suspensions and NewVariableInstance

2021-02-02 Thread Beckerle, Mike
Well, to sanity check ideas, here's my thoughts on how unparsers and variable 
instances and suspensions/expressions ought to interact.

There may be naive assumptions in here. If so let's find them.

So unparsers call each other in a recursive walk, and variable instances go 
in/out of scope as the unparsers are walked. That's on the variable-map 
structure stacks in UStateMain.

This creates the variable instances, and those specific variable instances 
should be the ones that are frozen into a suspension. I.e., the suspension 
shouldn't contain the variable map with things going into/out of scope, but 
just a single association of variable name to variable instance object.

So I think creating a suspension should create a snapshot of the top-of-stack 
for each variable as part of that UStateForSuspension. It's important that 
these variable instances are; however, shared with other expressions/unparser 
actions that are for that same scope. So we can't deep copy the variable 
instances themselves or we'll disconnect them from their producers and 
consumers. We could in principle copy the variable stacks with all their 
pointers to variable instances, but we'll only ever address the top-of-stack 
variable instances from a suspension.

These variable instances are then sort of floating in air, connected to 
expressions that produce/consume them by the suspension/expression system, but 
they're not being stack-maintained any more. They're heap objects at that 
point, disconnected from the stacks that controlled when they were 
scope-visible. They have to eventually be reclaimed by the garbage collector.

The variable going out of scope should only affect the variable map in the 
UStateMain object, not the suspensions, and it should remove a variable from 
scope, but not otherwise frob the variable-instance, which may be referenced by 
suspensions.

Now, all that said, I bet there's a flaw in there.





From: Adams, Joshua 
Sent: Tuesday, February 2, 2021 11:12 AM
To: dev@daffodil.apache.org 
Subject: Suspensions and NewVariableInstance

I've been running into a lot of headaches trying to get newVariableInstance to 
correctly handle suspensions.  Currently when a newVariableInstance statement 
is found, a NewVariableInstanceStart and End unparsers are created. 
NewVariableInstanceStart will immediately create the newVariableInstance with 
no value and, if applicable, will create a SuspendableExpression that will 
calculate the default value. This works as expected but as the NVI's go out of 
scope we start running into some issues.

The NewVariableIsntanceEnd unparser simply removes the variable instance that 
was created in NVIStart.  It is not performing or checking for any sort of 
suspension, so NVIEnd is called while the NVIStart suspension is still active. 
Since the UStateForSuspension object uses the same VariableMap as the main 
UState object, this results in the variable's value not being correct after it 
goes out of scope.

I'm not sure what a good solution for this would be.  I've attempted adding a 
SuspendableOperation to the NVIEnd unparser to wait until the variable has a 
value before removing, but this results in a SuspensionDeadlock.  I've also 
attempted doing a deep copy of all the variables for each UStateForSuspension 
object, but this too results in a SuspensionDeadlock, not to mention that any 
changes made in one suspension wouldn't be visible to other suspensions.  One 
other thought I had, which I'm not even sure would even address the suspension 
issue, is to have whatever sequence is containing the NVI statement handle 
calling the NVIEnd unparser by simply adding it to the end of its sequence 
child unparsers.

Any thoughts on how best to handle this sticky situation of dealing with 
newVariableInstance and unparsing suspensions?


Re: [RESULT] [VOTE] Contributors - Graduate Apache Daffodil (Incubating) to a top-level project

2021-01-29 Thread Beckerle, Mike
Ok. Will wait until Tuesday 5pm UTC-5

I do have a +1 from John Wass already on the thread.


From: Steve Lawrence 
Sent: Friday, January 29, 2021 10:46 AM
To: dev@daffodil.apache.org 
Subject: Re: [RESULT] [VOTE] Contributors - Graduate Apache Daffodil 
(Incubating) to a top-level project

I'm not sure that's true. Dave Fisher and John Wash haven't officially
voted in this thread. Although unlikely, we also have a couple inactive
PPMC who could still vote for this. And other contributors could bring
up issues that might cause the cancelling of this VOTE. I think we need
to give the minimum 72 hours so people have time to bring up any issues.

On 1/29/21 10:40 AM, Mike Beckerle wrote:
> We have received +1 from all active mentors, PPMC, and Contributors
>
> So the vote passes unanimously, with no need to wait the full time period.
>
> Thank you.
>
> Next step will be to prepare (and discuss here) our proposal to the board
> for graduation. This will then be voted on by the incubator PMC for
> approval to take it to the ASF board. Stay tuned.
>
> Permalink to vote thread:
>
> https://lists.apache.org/thread.html/r91af73ae28284945ffe881cea7877766213eaa405da21e233d85eb83%40%3Cdev.daffodil.apache.org%3E
>
> -mike beckerle
>
> On Thu, Jan 28, 2021 at 6:08 PM Mike Beckerle  wrote:
>
>> One of the first steps in graduating from the Apache incubator to top
>> level is making sure we have consensus among our contributors that this
>> makes sense now. This is an initial first step.
>>
>> Please reply with your vote (+1, 0, or -1 with reasons)
>>
>> This vote will be open for at least 72 hours (non-weekend), so until at
>> least 5pm US.ET (UTC-5) on Tuesday, Feb 2.
>>
>> You can review our wiki page about how we meet the ASF project maturity
>> model which is something projects do to self-assess before moving forward.
>> Your comments on this wiki page are also welcome.
>>
>>
>> https://cwiki.apache.org/confluence/display/DAFFODIL/Apache+Daffodil+Maturity+Model+Assesment
>>
>> About voting: see The Apache Voting Process:
>> https://www.apache.org/foundation/voting.html if you'd like to review the
>> process/guidance.
>>
>> My vote +1.
>>
>



  1   2   3   4   >