mbeckerle commented on a change in pull request #768:
URL: https://github.com/apache/daffodil/pull/768#discussion_r833642906



##########
File path: 
daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.dfdl.xsd
##########
@@ -0,0 +1,69 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<schema xmlns="http://www.w3.org/2001/XMLSchema"; 
targetNamespace="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset"
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+  xmlns:xsd="http://www.w3.org/2001/XMLSchema";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>
+
+  <include 
schemaLocation="org/apache/daffodil/xsd/DFDLGeneralFormat.dfdl.xsd"/>
+
+  <annotation>
+    <appinfo source="http://www.ogf.org/dfdl/";>
+      <dfdl:format ref="tns:GeneralFormat" />
+    </appinfo>
+  </annotation>
+
+  <element name="s1">
+    <complexType>
+      <sequence>
+        <element name="e1" dfdl:encoding="ISO-8859-13" 
dfdl:lengthKind="explicit" dfdl:length="8" type="xsd:string" />
+        <element name="e2" 
dfdl:encoding="X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE" 
dfdl:lengthKind="delimited" type="xsd:string" />

Review comment:
       Use ISO-8859-1 (missing a hyphen) consistent with ISO-8859-13 above. 

##########
File path: 
daffodil-test/src/test/resources/META-INF/services/org.apache.daffodil.processors.charset.CharsetCompiler
##########
@@ -0,0 +1,18 @@
+#  Licensed to the Apache Software Foundation (ASF) under one or more
+#  contributor license agreements.  See the NOTICE file distributed with
+#  this work for additional information regarding copyright ownership.
+#  The ASF licenses this file to You under the Apache License, Version 2.0
+#  (the "License"); you may not use this file except in compliance with
+#  the License.  You may obtain a copy of the License at
+#
+#      http://www.apache.org/licenses/LICENSE-2.0
+#
+#  Unless required by applicable law or agreed to in writing, software
+#  distributed under the License is distributed on an "AS IS" BASIS,
+#  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  See the License for the specific language governing permissions and
+#  limitations under the License.
+#
+
+org.apache.daffodil.charsets.BitsCharsetISO88591ReverseCompiler

Review comment:
       This is a good case for exception to the caml-case naming conventions is 
warranted. 
   BitsCharset_ISO_8859_1_ReverseCompiler 
   BitsCharset_ISO_8859_13_Compiler
   Those would be far easier to understand. Being able to pick out the "13" is 
important. 

##########
File path: 
daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.tdml
##########
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+  xmlns:xs="http://www.w3.org/2001/XMLSchema";
+  xmlns:fn="http://www.w3.org/2005/xpath-functions";
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions";
+  xmlns:ex="http://example.com";
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" 
defaultRoundTrip="none">
+
+  <tdml:parserTestCase name="parse_charsets" root="s1" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <s1>
+          <e1>À1234567</e1>
+          <e2>01234567</e2>
+        </s1>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+    <tdml:document>
+      <tdml:documentPart type="byte"><![CDATA[C0 31 32 33 34 35 36 37
+                                            CF CE CD CC CB CA C9 C8]]>
+      </tdml:documentPart>
+    </tdml:document>
+
+  </tdml:parserTestCase>
+
+  <tdml:unparserTestCase name="unparse_charsets" root="s1" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:document>
+      <tdml:documentPart type="byte"><![CDATA[C0 31 32 33 34 35 36 37
+                                          CF CE CD CC CB CA C9 C8]]>
+      </tdml:documentPart>
+    </tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <tns:s1>
+          <e1>Ą1234567</e1>
+          <e2>01234567</e2>
+        </tns:s1>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+  </tdml:unparserTestCase>
+
+  <tdml:parserTestCase name="parse_charsets2" root="s2" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <s2>
+          <e1>À1234567</e1>
+          <e2>À1234567</e2>

Review comment:
       First char inside <e2> should be Ą once you have a real 8859-13 decoder, 
which I suggest you fix below in comments on the scala code for this charset. 
   

##########
File path: 
daffodil-test/src/test/scala/org/apache/daffodil/charsets/ISO885913.scala
##########
@@ -0,0 +1,59 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.charsets
+
+import org.apache.daffodil.processors.charset.BitsCharsetJava
+import org.apache.daffodil.processors.charset.BitsCharsetDecoderByteSize
+import org.apache.daffodil.io.InputSourceDataInputStream
+import org.apache.daffodil.io.FormatInfo
+import org.apache.daffodil.processors.charset.CharsetCompiler
+import org.apache.daffodil.processors.charset.BitsCharsetFactory
+
+object BitsCharsetISO885913 extends {

Review comment:
       Call this BitsCharsetTest8859_13 or something. 
   
   So this is half a charset.
   
   But this makes your tests wierd. Since they're asymmetric. Parse it behaves 
as 8859-1, unparse as 8859-13.
   
   You can fix this easily because you just need to define decodeOneChar so 
that it does the lookup of the char in a    length 256 string having the chars 
corresponding to bytes 0 to 255 of iso-8859-13, which you can do via 
   ```
   val decodeString = {
     val bytes = ByteBuffer.wrap((0 to 255).map{ _.toByte }.toArray)
     Charset.forName("iso-8859-13").newDecoder().decode(bytes).toString
     }
     
     protected override def decodeOneChar(dis: InputSourceDataInputStream, 
finfo: FormatInfo): Char = {
       val byte = getByte(dis, 0)
       decodeString(byte)
     }
   ```

##########
File path: 
daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.tdml
##########
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+  xmlns:xs="http://www.w3.org/2001/XMLSchema";
+  xmlns:fn="http://www.w3.org/2005/xpath-functions";
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions";
+  xmlns:ex="http://example.com";
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" 
defaultRoundTrip="none">
+
+  <tdml:parserTestCase name="parse_charsets" root="s1" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <s1>
+          <e1>À1234567</e1>

Review comment:
       Add comment that first character is Unicode Code point 0xC0, LATIN 
CAPITAL LETTER A WITH GRAVE. Just in case someone is lacking a font that 
displays it. 

##########
File path: 
daffodil-test/src/test/resources/org/apache/daffodil/charsets/CustomCharset.tdml
##########
@@ -0,0 +1,151 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+<tdml:testSuite xmlns:tdml="http://www.ibm.com/xmlns/dfdl/testData";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xmlns:dfdl="http://www.ogf.org/dfdl/dfdl-1.0/";
+  xmlns:xs="http://www.w3.org/2001/XMLSchema";
+  xmlns:fn="http://www.w3.org/2005/xpath-functions";
+  xmlns:dfdlx="http://www.ogf.org/dfdl/dfdl-1.0/extensions";
+  xmlns:ex="http://example.com";
+  xmlns:tns="urn:org.apache.daffodil.charsets.CustomCharset" 
defaultRoundTrip="none">
+
+  <tdml:parserTestCase name="parse_charsets" root="s1" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <s1>
+          <e1>À1234567</e1>
+          <e2>01234567</e2>
+        </s1>
+      </tdml:dfdlInfoset>
+    </tdml:infoset>
+    <tdml:document>
+      <tdml:documentPart type="byte"><![CDATA[C0 31 32 33 34 35 36 37
+                                            CF CE CD CC CB CA C9 C8]]>
+      </tdml:documentPart>
+    </tdml:document>
+
+  </tdml:parserTestCase>
+
+  <tdml:unparserTestCase name="unparse_charsets" root="s1" 
model="org/apache/daffodil/charsets/CustomCharset.dfdl.xsd">
+
+    <tdml:document>
+      <tdml:documentPart type="byte"><![CDATA[C0 31 32 33 34 35 36 37
+                                          CF CE CD CC CB CA C9 C8]]>
+      </tdml:documentPart>
+    </tdml:document>
+    <tdml:infoset>
+      <tdml:dfdlInfoset>
+        <tns:s1>
+          <e1>Ą1234567</e1>

Review comment:
       Add comment that first character is Unicode 0x104, LATIN CAPITAL LETTER 
A WITH OGONEK. 

##########
File path: 
daffodil-test/src/test/scala/org/apache/daffodil/charsets/ISO88591Reverse.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.charsets
+
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.processors.charset.BitsCharsetNonByteSize
+import org.apache.daffodil.processors.charset.CharsetCompiler
+import org.apache.daffodil.processors.charset.BitsCharsetFactory
+
+object BitsCharsetISO88591Reverse extends{
+  override val name = "X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE"

Review comment:
       ISO-8859-1 (add hypen)

##########
File path: 
daffodil-test/src/test/scala/org/apache/daffodil/charsets/ISO88591Reverse.scala
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.charsets
+
+import org.apache.daffodil.schema.annotation.props.gen.BitOrder
+import org.apache.daffodil.processors.charset.BitsCharsetNonByteSize
+import org.apache.daffodil.processors.charset.CharsetCompiler
+import org.apache.daffodil.processors.charset.BitsCharsetFactory
+
+object BitsCharsetISO88591Reverse extends{
+  override val name = "X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE"
+  override val bitWidthOfACodeUnit = 8
+  override val decodeString = (0 to 255).map { _.toChar }.mkString.reverse
+  override val replacementCharCode = 0x0
+  override val requiredBitOrder = BitOrder.MostSignificantBitFirst
+} with BitsCharsetNonByteSize
+
+
+final class BitsCharsetISO88591ReverseCompiler
+  extends CharsetCompiler("X-DFDL-ISO-88591-8-BIT-PACKED-LSB-FIRST-REVERSE") {

Review comment:
       add hyphen 8859-1

##########
File path: 
daffodil-io/src/main/scala/org/apache/daffodil/processors/charset/BitsCharsetTransformerFactory.scala
##########
@@ -0,0 +1,35 @@
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.daffodil.processors.charset
+
+//import org.apache.daffodil.processors.charset.BitsCharset

Review comment:
       Hmmm. How does this work without this import. You are definitely 
referencing this BitsCharset below.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to