Re: [Digester] HTML entity decoding?

2009-04-22 Thread Paul Libbrecht


Le 22-avr.-09 à 06:06, Otis Gospodnetic a écrit :
I'm no XML guru, so some of this stuff is fuzzy.  Please see my  
comments/questions below.


I'm happy to help ;-)

XML files I'm trying to parse do have links to DTDs in the  
header (sometimes with a full http://... URL, and sometimes with  
just a local file name), but there are no actual DTD files there.   
Is the first step, then, making sure that the referenced DTD files  
really exist at locations pointed to in the header of the XML?


The short answer is yes.
The long answer is yes except if you manage to configure xml catalogs  
(I think that, in the case of Xerces, something such as the  
XmlResolver is used) which associate public-ids to local files.  
That's best for performance but long to configure.


I suppose this going to be living in something that is not command- 
line so DTDs should be cached. At worst, make sure the property for  
such in the parser is st.



Here's a text pointing to such a DTD:
http://www.w3.org/TR/xhtml-modularization/dtd_module_defs.html#a_xhtml_character_entities


So does this mean i would have to ensure that the DTD files contain  
things like:

!ENTITY nbsp   #160; 
!ENTITY iexcl  #161; 
!ENTITY uuml   #252; 
...
and so on?
And if my DTD had this, are you saying Digester would decode my:
name![CDATA[Gruuml;ber]]/name


no
but
nameGruuml;ber/name
(the other form is exactly an escape which is equivalent to
  nameGramp;uuml;ber/name not what you want!)


to Grüber?  Or to #252; ?


(both of the above are equivalent in XML compliant parsers. A method  
reading that XML would only receive Grüber.



My end goal is to index this data with Lucene/Solr, so I need it to  
be Grüber before I send it to Lucene/Solr.
In other words, if I end up with #252, this is still no good for  
me, as I still wouldn't have Grüber.


You could also insert the DTDs inside the solr document.

Note that opening the file with a validating parser will certainly  
grumble about
all sorts of undeclared elements, this is ok, it does not prevent  
parsing but

is, indeed, a validation error.


Uh, I'm lost here.  Which file are you referring to?  DTD or the XML  
file?  Sounds like XML.  And why would I get complaints about  
undeclared elements?


the DTD has the double function of declaring elements and attributes  
as well as entities.
DTD validation will fail if you have just defined entities in your DTD  
but not the relevant elements.

XML parsing will fail if you use entities that you have not defined.


However you get the entity-expansion.


How?  If I make the XML parser validating?


if you use a conforming parsing.


This is what I do to my Digester instance as soon as I create it:
   dig.setValidating(false);


this is to prevent that validating failures (such as undeclared  
attributes or elements stop processing it is good.



   dig.setEntityResolver(new NoOpEntityResolver());
And that NoOpEntityResolver is my custom class that implements the  
resolveEntity method:


I believe that is definitely the problem! ;-)
Please note that most DTD files that people refer to are easy to get  
publicly and are often bundled with software.


What kind of files are these that you are reading with Digester?
Do you have samples?
You seem to be lacking control of the DTDs in the same fancy way HTML  
files are done. I would consider NekoHtml tools then.


Note that using the first form, which contains an *escaped* entity,  
there's
nothing to do! You'd have to match them manually (re-entrantly)  
into a parser

that parses entities properly.


Uh, what does this mean? :)
Are you saying uuml; is the escaped form of the entity?  (what  
would be the unescaped form of it?)


I was saying ![CDATA[Gruuml;ber]] or Gramp;uuml;ber is the escaped  
form for which you can only fix by applying regexps (which might break  
other things).


And what do you mean by there is nothing to do?  (I was hoping the  
parser would do the work and convert uuml; to ü)
I don't understand the last sentence so I'm not even sure how to  
ask any questions about it but it sounds like you are saying  
that some parsers may simply do what I need, just not Digester?  I'm  
not sure what you mean by manual matching?


Digester is not a parser, it uses the JAXP-available parsers.
By default in JDK = 1.5, this is a xerces copy (under com.sun  
packages).
If you have other parsers in the classpath these may be rather taken  
(something in META-INF can be used I think).


Xerces does a good job so it's definitely possible to work with it.  
E.g. DTD caching can be configured for it as well as catalogs.


Digester is there to make the interface between xml-parsing and java  
objects.

If you're just producing XML outside, there may be alternatives, indeed.

paul

smime.p7s
Description: S/MIME cryptographic signature


Re: [SCXML] getting set datats in the datamodel

2009-04-22 Thread Rahul Akolkar
On Wed, Apr 22, 2009 at 9:35 AM, Linda Erlenhov
linda.erlen...@gmail.com wrote:
 Hello
 Is there anybody that can help me with my problem described below?

 best regards
 //Linda

 On Mon, Apr 20, 2009 at 2:05 PM, Linda Erlenhov 
 linda.erlen...@gmail.comwrote:

 Hello
 I think I´ve done some mixing between two things that doesn´t work together
 as I hoped it would.

 I have this Datamodel, the scxml document starts like this:
 
 scxml version=1.0 initialstate=INIT xmlns:cs=
 http://commons.apache.org/scxml; xmlns=http://www.w3.org/2005/07/scxml;

 datamodel
 data name=DynamicData
 NumDat xmlns= id=1 type=Integer0/NumDat
 /data
 data name=Indication1 expr=false/
 /datamodel

 snip/-

 I assign the Indication1 later on:

 ---
 state id=StateC
 onentry
 log label=Renegade expr='Entering state: StateC'/
 assign name=Indication1 expr=true/
 /onentry

 snip/---

 And the DynamicData also later:
 ---
 state id=StateB
 onentry
 log label=Renegade expr='Entering state: StateB'/
 log label=Renegade expr=Data(DynamicData,'NumDat')/
 assign location=Data(DynamicData,'NumDat')
 expr=Data(DynamicData,'NumDat')+1/
 log label=Renegade expr=Data(DynamicData,'NumDat')/
 /onentry

 snip/---

 I implemented a custom context with a notification functionality in the
 set function (observer observed pattern) but the problem now is that the
 only time the set function in the context is used is when indications are
 set. Not when the DynamicData is set. I know that the SCXML works and that
 the expressions evaluate properly because of the log:labels, my guess is
 that it´s something with the Data() function that makes these expressions do
 something different. What? Where is the set for the DynamicData located?
snip/

Yup, I see what you are running into. Unfortunately for the specific
usage pattern here, the two assign variations have different
semantics as follows:

1)  assign name=... expr=.../
is a set operation, which produces a Context#set(...) call

2) assign location=... expr=.../
is really a mutation operation, it retrieves the XML data tree
(stored as a DOM node in memory) and manipulates it -- there is no
call to Context#set(...)


 How do I notify when my DynamicData has changed?

snap/

ISTR that you prefer to not use custom actions. With those
constraints, one option (since you are generating all the SCXML) is to
accomodate for the above variation via the SCXML markup itself -- so
you could generate a redundant identity assignment to trigger the
Context#set(...) call like so:

!-- assignment below taken from example above --
assign location=Data(DynamicData,'NumDat')
expr=Data(DynamicData,'NumDat')+1/
!-- followed by assignment that triggers the set call with the new value --
assign name=DynamicData expr=DynamicData/

-Rahul

-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org



[commons-net] FTPClient setReceiveBufferSize() setSendBufferSize()

2009-04-22 Thread cloudboy
I have an ftp connection that would greatly benefit from having very large 
TCP/IP window sizes (1MB). I'm having trouble figuring out how to implement 
this using the standard FTPClient. What's the intended usage of 
setReceiveBufferSize() and setSendBufferSize(), which are inherited from the 
SocketClient? 

From what I understand, setReceiveBufferSize() must be set prior to binding to 
the socket? However, if these methods are called prior to FTPClient.connect(), 
then this socket object isn't initialized. After FTPClient.connect(), is it 
too late? 

Any insight into configuring the FTPClient to configure these window sizes is 
greatly appreciated. 

Thanks, 

Phil 


Re: [commons-net] FTPClient setReceiveBufferSize() setSendBufferSize()

2009-04-22 Thread Steve Cole
Download the source, modify the SocketClient.connect() method and use the
setReceiveBufferSize() and setSendBufferSize() methods and see if it gives
you results you want.

Have you tried using the FTPClient.setBufferSize() method, which sets the
buffersize of the BufferedInputStream used for the retrieveFile() method and
BufferedOutstream used for the storeFile() method?

- Original Message - 
From: cloud...@comcast.net
To: user@commons.apache.org
Sent: Wednesday, April 22, 2009 3:01 PM
Subject: [commons-net] FTPClient setReceiveBufferSize() setSendBufferSize()


 I have an ftp connection that would greatly benefit from having very large
TCP/IP window sizes (1MB). I'm having trouble figuring out how to implement
this using the standard FTPClient. What's the intended usage of
setReceiveBufferSize() and setSendBufferSize(), which are inherited from the
SocketClient?

 From what I understand, setReceiveBufferSize() must be set prior to
binding to the socket? However, if these methods are called prior to
FTPClient.connect(), then this socket object isn't initialized. After
FTPClient.connect(), is it too late?

 Any insight into configuring the FTPClient to configure these window sizes
is greatly appreciated.

 Thanks,

 Phil



-
To unsubscribe, e-mail: user-unsubscr...@commons.apache.org
For additional commands, e-mail: user-h...@commons.apache.org