Hi Isha, I noticed the *transient* modifier too but it is the same in 3.2 and causes no problems; it seems to get initialized by the *setup()* method of the output port from the TUPLE_CLASS attribute: https://github.com/apache/incubator-apex-malhar/blob/release-3.2/contrib/src/main/java/com/datatorrent/contrib/schema/parser/Parser.java
Not sure why the same thing doesn't happen in 3.3 -- haven't had time to chase that down -- since adding this doesn't fix the issue: *dag.setOutputPortAttribute(parser.out, PortContext.TUPLE_CLASS, Employee.class);* Regardless of which output port I use, I get the same exception since it comes from *XmlParser.setup()* Yes, I agree that we should revert this back to the way it was in 3.2, not because of simplicity/complexity but to follow the semantic versioning principles: 3.3 should be backward compatible with 3.2. The package change means that any existing use of this operator in 3.2 will not even compile with 3.3. We can put the change back for X.Y.Z for some X > 3. Ram On Mon, May 9, 2016 at 5:25 PM, Isha Arkatkar <[email protected]> wrote: > Hi Ram, > > > Were you able to resolve this issue? I debugged this problem a little > bit to find root cause of the issue. Turned out there is indeed a problem > in XMLParser operator. It was because clazz variable is marked transient in > the Parser super class. So, the value was null in setup call as transient > clazz variable value got dropped in serialization and de-serialization. > This did not get caught in unit tests, since those didn't test the > application serialization part. To fix issue, the transient should be > removed for clazz field in Parser class. > > Moreover, I think you are right about DocumentBuilder class, though I am > afraid I cannot remember the reason for adding parsedOutput as output port. > Could you use output port 'out' in super class as before? > > To give a little background on the moving this operator from > malhar-contrib to library. Originally, while adding xsd schema validation, > I had changed dependency from XStream to JAXB. Since, there were no > additional dependencies needed for XML parser anymore, I moved the class to > malhar-library. This introduced some of the issues you saw. > In retrospect, I was wondering if it makes sense to revert this class to > 3.2 if Xstream usage was more straight-forward. > > Thanks, > Isha > > On Mon, May 9, 2016 at 10:05 AM, Munagala Ramanath <[email protected]> > wrote: > > > Looks like *XmlParser* operator in 3.3 is broken in a couple of ways: > > > > 1. It uses *DocumentBuilder* and related classes but supplies the XML > input > > string to* DocumentBuilder.parse()*. But that method takes a File, > > InputSource or URI, _not_ an XML string: > > > > > https://docs.oracle.com/javase/7/docs/api/javax/xml/parsers/DocumentBuilder.html > > 2. It overrides *setup()* and within it invokes > > *JAXBContext.newInstance(getClazz());* which fails if the* clazz *field > is > > null; this was not the case with the 3.2 version -- still trying to > figure > > out why *clazz* is null even after I explicitly set it to a non-null > value > > in *populateDAG()*. > > > > I'll create a JIRA and add more details there. > > > > Ram > > > > On Sun, May 8, 2016 at 7:02 PM, Munagala Ramanath <[email protected]> > > wrote: > > > > > Hi, > > > > > > I wrote a small app to exercise the XmlParser operator. The app works > > fine > > > with Malhar 3.2 > > > but fails with 3.3 with an exception like this: > > > > > > java.lang.IllegalArgumentException > > > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:637) > > > at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:584) > > > at com.datatorrent.lib.parser.XmlParser.setup(XmlParser.java:135) > > > at com.datatorrent.lib.parser.XmlParser.setup(XmlParser.java:63) > > > at com.datatorrent.stram.engine.Node.setup(Node.java:161) > > > at > > > > > > com.datatorrent.stram.engine.StreamingContainer.setupNode(StreamingContainer.java:1287) > > > at > > > > > > com.datatorrent.stram.engine.StreamingContainer.access$100(StreamingContainer.java:92) > > > at > > > > > > com.datatorrent.stram.engine.StreamingContainer$2.run(StreamingContainer.java:1361) > > > > > > The operator has moved to the library module in 3.3 from contrib and > > there > > > are other changes as > > > well, so I made the minor changes needed to accomodate the move but to > no > > > avail. I tried > > > both 3.2.0 and 3.3.0 of apex-core, tried adding JAXB annotations to the > > > Employee class > > > but nothing seems to make any difference -- I get the same exception. > > > > > > My app for 3.3 (slightly different for 3.2) looks like this: > > > ------------------------------------- > > > *public void populateDAG(DAG dag, Configuration conf)* > > > * {* > > > * Gen gen = dag.addOperator("generator", new Gen());* > > > > > > * // configure parser* > > > * XmlParser parser = dag.addOperator("parser", new XmlParser());* > > > * parser.setClazz(Employee.class);* > > > > > > * ConsoleOutputOperator cons = dag.addOperator("console", new > > > ConsoleOutputOperator());* > > > > > > * dag.addStream("input", gen.output, parser.in > > > <http://parser.in>).setLocality(Locality.CONTAINER_LOCAL);* > > > * dag.addStream("data", parser.parsedOutput, > > > cons.input).setLocality(Locality.CONTAINER_LOCAL);* > > > ---------------------------------------- > > > > > > Both versions of the project are in branch *add-xmlparse* at: > > > *[email protected]:amberarrow/examples.git* > > > > > > Anybody know the right way to use this operator in 3.3 ? > > > > > > Thanks. > > > > > > Ram > > > > > >
