On Thursday 30 September 2004 5:39 am, Mario Ivankovits wrote:
> Steve Cohen wrote:
>  >First parser to successfully parse every
>  >item in the listing I suppose.  Or do you go for best score?
>
> line-by-line - the first parser which is able to parse should be cached
> (ad performance) - that way it might be only slower on the first match
> However the parser should be prepared to redetect the language as soon
> as it fails at a later time - maybe there
> are minor differences between languages and the first detection wasnt
> correct -
> e.g. Mar  (March) might not be uniq if you talk to a german ftp server
> wich do not use umlauts (MÃrz => Mar)

This business of constantly churning does bother me. 


>
>  >What if none of
>  >the parsers in your composite works?  Then what?
>
> Like now - a "null" entry in the list of returned entries.
> Or we change the paradigm NET uses today and throw an exception - but
> this is worth a thread on its own ;-)
>
>  >2) Will we be opening ourselves to arguments as to which languages are
>
> "in"
>
>  >the composite?  Or in which order?  If you're using Italian and it has
>
> to try
>
>  >US English, British English, French and German first, your performance is
>  >going to be lousy.  Which brings me to
>
> Is there a difference between US and British?

The original complaint which got this started, about AIX comes from Britain.  
http://issues.apache.org/bugzilla/show_bug.cgi?id=27437
I believe the month names will be the same, but not necessarily the order.  In 
Linux, I found that the month-day order was preserved regardless of locale. 
(although my test was with French, not en_UK). In that defect there is an 
example about AIX where there was a difference between en_US and en_UK.

>
> Performance: As i said - we could cache the last matching language -
> then only the first search might be slow.
>
> Such a composite might only fail if one have to use croatic and polnish
> language at once. There the names "lis" and "lip" means different
> months. (at least of the point of "java short names" view)

So you are saying that between these two languages, that the same 
abbreviations in one language refer to different months in another?  That is 
a real problem for autodetection.  I guess you could say it's not affecting 
the most common languages.   But it doesn't make me happy.

> This is why i am not against your solution at all, the composite parser
> should only be one additional possiblity - and IMHO the default parser.

I agree that the composite parser could be a possibility.  I disagree 
vehemently that it should be the default.

>
> I think this composite could be configureable by a static map (system
> wide). There I would like to configure it
> to detect "US", "DE", "FR" (in this order) and i am fine with 100% of
> all ftp server i have to contact today.
> In the case of ant it could be configured by e.g lang="US,DE,FR"
> Or by a system property, .... or .... we could discuss this if we found
> a consens at all.
>
> And we should also discuss that you dont want to take SYST into account
> - or at least the possiblity to do so, but this depends also for which
> file entry parsers you would like to implement the date stuff. Currently
> I am only aware the fact the unices to this language stuff.
>
>  >3) This is too much run-time trial and error for my tastes.  The
>
> average user
>
>  >of our library is not writing the ultimate FTP client.  He is writing
>
> a java
>
>  >app or Ant script to connect repeatedly to an FTP server somewhere.
>
> Once he
>
>  >gets the right parser, he never has need of trying others for that
>  > server.
>
> ... or using VFS. And VFS would like the be the super ftp, ssh, ....
> client. Like a filesystem works - the user dont want to be bothered with
> things like date styles.

OK, I understand you a little better, you are approaching this from the angle 
of VFS.  So, you could make your composite parser the default parser USED BY 
VFS.  In other usages, where our user is simply setting up a little system to 
talk to some specific ftp server about which he knows all the details, the 
composite parser is a needless performance drain.

>
> For sure, I am not fully against the solution you have in mind, i just
> would like to ensure it is posssible to pass
> in a parser which uses a completley different strategy.
> And again: The user do not have to choose a file-entry-parser now - is
> is done automatically by SYST (i know you know ;-)) -
> but now we force him to select the correct date format - today if he
> changes the url (and a appropriate parser
> is available) the file parsing works without any additional attention.

No, we force him to do nothing.  My goal, expressed a few posts back, was that 
the system work by default exactly as it does now.  The additional 
functionalities would only exist to help him out of the odd cases.  Changing 
the default parser that autodetection provides could provide some real 
surprises.

>
> <vision>
> Maybe we would provide a parser with a TreeMap where all month names and
> their numbers are stored - the community could
> help to fill this map - or a properties file which could easily be changed.
> </vision>
>
>  >4) On the other hand, your idea could be the basis of a pretty cool
>
> tool based
>
>  >on NetComponents: point it at an FTP server somewhere, let it try all the
>  >tricks it knows, and somehow it returns its best guess as to what
>
> parser and
>
>  >parser date format to use for that server.
>
> Thats the point - like to comfort we provide with the automatich
> detection of the needet file-entry-parser.
> Computers should work for humans and not humans for computers ;-)
>
> As i tried to say earlier: Today the parsing works pretty well - we do
> have problems only with the month
> name (and unknown servers). As long as the date parts are not in
> different order (based on the language)
> why implement such a drastic change in the comfort NET provides today -
> A black box where the user passes
> in an url and gets a file listing is what the user really wants.

I think you are proposing a swiss-army-knife.  While this could be an 
indispensable tool in a few situations, it's an inefficient answer for the 
great majority of them.  Yes, there should be a swiss army knife, but it 
should not be the default.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to