Re: [Digester] HTML entity decoding?
Le 22-avr.-09 à 06:06, Otis Gospodnetic a écrit : I'm no XML guru, so some of this stuff is fuzzy. Please see my comments/questions below. I'm happy to help ;-) XML files I'm trying to parse do have links to DTDs in the header (sometimes with a full http://... URL, and sometimes with just a local file name), but there are no actual DTD files there. Is the first step, then, making sure that the referenced DTD files really exist at locations pointed to in the header of the XML? The short answer is yes. The long answer is yes except if you manage to configure xml catalogs (I think that, in the case of Xerces, something such as the XmlResolver is used) which associate public-ids to local files. That's best for performance but long to configure. I suppose this going to be living in something that is not command- line so DTDs should be cached. At worst, make sure the property for such in the parser is st. Here's a text pointing to such a DTD: http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities So does this mean i would have to ensure that the DTD files contain things like: !ENTITY nbsp #160; !ENTITY iexcl #161; !ENTITY uuml #252; ... and so on? And if my DTD had this, are you saying Digester would decode my: name![CDATA[Gruuml;ber]]/name no but nameGruuml;ber/name (the other form is exactly an escape which is equivalent to nameGramp;uuml;ber/name not what you want!) to Grüber? Or to #252; ? (both of the above are equivalent in XML compliant parsers. A method reading that XML would only receive Grüber. My end goal is to index this data with Lucene/Solr, so I need it to be Grüber before I send it to Lucene/Solr. In other words, if I end up with #252, this is still no good for me, as I still wouldn't have Grüber. You could also insert the DTDs inside the solr document. Note that opening the file with a validating parser will certainly grumble about all sorts of undeclared elements, this is ok, it does not prevent parsing but is, indeed, a validation error. Uh, I'm lost here. Which file are you referring to? DTD or the XML file? Sounds like XML. And why would I get complaints about undeclared elements? the DTD has the double function of declaring elements and attributes as well as entities. DTD validation will fail if you have just defined entities in your DTD but not the relevant elements. XML parsing will fail if you use entities that you have not defined. However you get the entity-expansion. How? If I make the XML parser validating? if you use a conforming parsing. This is what I do to my Digester instance as soon as I create it: dig.setValidating(false); this is to prevent that validating failures (such as undeclared attributes or elements stop processing it is good. dig.setEntityResolver(new NoOpEntityResolver()); And that NoOpEntityResolver is my custom class that implements the resolveEntity method: I believe that is definitely the problem! ;-) Please note that most DTD files that people refer to are easy to get publicly and are often bundled with software. What kind of files are these that you are reading with Digester? Do you have samples? You seem to be lacking control of the DTDs in the same fancy way HTML files are done. I would consider NekoHtml tools then. Note that using the first form, which contains an *escaped* entity, there's nothing to do! You'd have to match them manually (re-entrantly) into a parser that parses entities properly. Uh, what does this mean? :) Are you saying uuml; is the escaped form of the entity? (what would be the unescaped form of it?) I was saying ![CDATA[Gruuml;ber]] or Gramp;uuml;ber is the escaped form for which you can only fix by applying regexps (which might break other things). And what do you mean by there is nothing to do? (I was hoping the parser would do the work and convert uuml; to ü) I don't understand the last sentence so I'm not even sure how to ask any questions about it but it sounds like you are saying that some parsers may simply do what I need, just not Digester? I'm not sure what you mean by manual matching? Digester is not a parser, it uses the JAXP-available parsers. By default in JDK = 1.5, this is a xerces copy (under com.sun packages). If you have other parsers in the classpath these may be rather taken (something in META-INF can be used I think). Xerces does a good job so it's definitely possible to work with it. E.g. DTD caching can be configured for it as well as catalogs. Digester is there to make the interface between xml-parsing and java objects. If you're just producing XML outside, there may be alternatives, indeed. paul smime.p7s Description: S/MIME cryptographic signature
Re: [SCXML] getting set datats in the datamodel
On Wed, Apr 22, 2009 at 9:35 AM, Linda Erlenhov linda.erlen...@gmail.com wrote: Hello Is there anybody that can help me with my problem described below? best regards //Linda On Mon, Apr 20, 2009 at 2:05 PM, Linda Erlenhov linda.erlen...@gmail.comwrote: Hello I think I´ve done some mixing between two things that doesn´t work together as I hoped it would. I have this Datamodel, the scxml document starts like this: scxml version=1.0 initialstate=INIT xmlns:cs= http://commons.apache.org/scxml; xmlns=http://www.w3.org/2005/07/scxml; datamodel data name=DynamicData NumDat xmlns= id=1 type=Integer0/NumDat /data data name=Indication1 expr=false/ /datamodel snip/- I assign the Indication1 later on: --- state id=StateC onentry log label=Renegade expr='Entering state: StateC'/ assign name=Indication1 expr=true/ /onentry snip/--- And the DynamicData also later: --- state id=StateB onentry log label=Renegade expr='Entering state: StateB'/ log label=Renegade expr=Data(DynamicData,'NumDat')/ assign location=Data(DynamicData,'NumDat') expr=Data(DynamicData,'NumDat')+1/ log label=Renegade expr=Data(DynamicData,'NumDat')/ /onentry snip/--- I implemented a custom context with a notification functionality in the set function (observer observed pattern) but the problem now is that the only time the set function in the context is used is when indications are set. Not when the DynamicData is set. I know that the SCXML works and that the expressions evaluate properly because of the log:labels, my guess is that it´s something with the Data() function that makes these expressions do something different. What? Where is the set for the DynamicData located? snip/ Yup, I see what you are running into. Unfortunately for the specific usage pattern here, the two assign variations have different semantics as follows: 1) assign name=... expr=.../ is a set operation, which produces a Context#set(...) call 2) assign location=... expr=.../ is really a mutation operation, it retrieves the XML data tree (stored as a DOM node in memory) and manipulates it -- there is no call to Context#set(...) How do I notify when my DynamicData has changed? snap/ ISTR that you prefer to not use custom actions. With those constraints, one option (since you are generating all the SCXML) is to accomodate for the above variation via the SCXML markup itself -- so you could generate a redundant identity assignment to trigger the Context#set(...) call like so: !-- assignment below taken from example above -- assign location=Data(DynamicData,'NumDat') expr=Data(DynamicData,'NumDat')+1/ !-- followed by assignment that triggers the set call with the new value -- assign name=DynamicData expr=DynamicData/ -Rahul - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
[commons-net] FTPClient setReceiveBufferSize() setSendBufferSize()
I have an ftp connection that would greatly benefit from having very large TCP/IP window sizes (1MB). I'm having trouble figuring out how to implement this using the standard FTPClient. What's the intended usage of setReceiveBufferSize() and setSendBufferSize(), which are inherited from the SocketClient? From what I understand, setReceiveBufferSize() must be set prior to binding to the socket? However, if these methods are called prior to FTPClient.connect(), then this socket object isn't initialized. After FTPClient.connect(), is it too late? Any insight into configuring the FTPClient to configure these window sizes is greatly appreciated. Thanks, Phil
Re: [commons-net] FTPClient setReceiveBufferSize() setSendBufferSize()
Download the source, modify the SocketClient.connect() method and use the setReceiveBufferSize() and setSendBufferSize() methods and see if it gives you results you want. Have you tried using the FTPClient.setBufferSize() method, which sets the buffersize of the BufferedInputStream used for the retrieveFile() method and BufferedOutstream used for the storeFile() method? - Original Message - From: cloud...@comcast.net To: user@commons.apache.org Sent: Wednesday, April 22, 2009 3:01 PM Subject: [commons-net] FTPClient setReceiveBufferSize() setSendBufferSize() I have an ftp connection that would greatly benefit from having very large TCP/IP window sizes (1MB). I'm having trouble figuring out how to implement this using the standard FTPClient. What's the intended usage of setReceiveBufferSize() and setSendBufferSize(), which are inherited from the SocketClient? From what I understand, setReceiveBufferSize() must be set prior to binding to the socket? However, if these methods are called prior to FTPClient.connect(), then this socket object isn't initialized. After FTPClient.connect(), is it too late? Any insight into configuring the FTPClient to configure these window sizes is greatly appreciated. Thanks, Phil - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org