Re: [digester] digester performance..
I'd discourage XPath since implies maintaining the DOM in memory, if the XML document Patrick is parsing is large is thousand and thousand of Megabytes, XPath is not efficient as well. Patrick, honestly I didn't understand the problem :) sounds a Lucene performance problem, did you already try writing on Lucene ML? Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ On Tue, Apr 5, 2011 at 11:24 PM, Jimmy Zhang wrote: > Have you considered using xpath instead of digester? > > -Original Message- From: Patrick Diviacco > Sent: Tuesday, April 05, 2011 4:08 AM > To: Commons Users List > Subject: [digester] digester performance.. > > hi, > > I've a java app and I've stopped to use Digester recently because all my > data is now kept in RAM and I don't need to write/parse xml files anymore. > > However, since I don't use Digester and external xml files, the performance > of my app got worse. > > I now have the same data stored in a ArrayList> and I'm > iterate them with a for cycle. > > Before they were in a xml file with the following structure: > > > > > .. > > .. > > > Is really Digester much faster in iterating my data from xml file than a for > loop iterating an ArrayList with the same content? > > thanks > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [digester] digester performance..
Have you considered using xpath instead of digester? -Original Message- From: Patrick Diviacco Sent: Tuesday, April 05, 2011 4:08 AM To: Commons Users List Subject: [digester] digester performance.. hi, I've a java app and I've stopped to use Digester recently because all my data is now kept in RAM and I don't need to write/parse xml files anymore. However, since I don't use Digester and external xml files, the performance of my app got worse. I now have the same data stored in a ArrayList> and I'm iterate them with a for cycle. Before they were in a xml file with the following structure: .. .. Is really Digester much faster in iterating my data from xml file than a for loop iterating an ArrayList with the same content? thanks - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [digester] digester performance..
HI, no it is not the same program. I'm basically calling the method below in a for loop. In my first app I invoked it only once over the entire index (30 rows), and it took 2 minutes. Now I'm calling it in loop for each row, because I need to update my index, which is growing (first iteration 1 row, then 2... then 2 again, then 3... and so on -it is a clustering algorithm and each row is a cluster). It is supposed to be slow but I'm surpise it takes more than 1 hour. Thanks public static void performQuery(QueryDoc queryDoc) throws java.io.IOException { BooleanQuery booleanQuery = new BooleanQuery(true); notRelevant = new MatchAllDocsQuery(); booleanQuery.add(notRelevant, BooleanClause.Occur.SHOULD); try { phrase = queryDoc.getTitle(); for (int i = 0; i < phrase.length; i++) { title = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "title", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } phrase = queryDoc.getDescription(); for (int i = 0; i < phrase.length; i++) { description = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "description", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } //time = new TermQuery(new Term("time",queryDoc.getTime())); //booleanQuery.add(time, BooleanClause.Occur.SHOULD); phrase = queryDoc.getTags(); for (int i = 0; i < phrase.length; i++) { tags = new BooleanQuery(); booleanQuery.add(new QueryParser(org.apache.lucene.util.Version.LUCENE_40, "tags", new WhitespaceAnalyzer(org.apache.lucene.util.Version.LUCENE_40)).parse(phrase[i]), BooleanClause.Occur.SHOULD); } } catch (ParseException pe) { //System.out.println(pe.getMessage()); } topDocs = searcher.search(booleanQuery, 22); writeResults(topDocs, queryDoc); } On 5 April 2011 15:45, Simone Tripodi wrote: > Hi Patrick, > if the Digester program you're speaking about is the one you pasted > here time ago... well, there were a lot of optimization missed. For > example I suggested you using the Lucene rules instead of storing all > the properties in a POJO then creating the Lucene Document, in that > way you limit the amount of stored data. > > When parsing large XML document - like your case - I suggest you > mapping to Object as less as possible and stream more. > > HTH, > Simo > > http://people.apache.org/~simonetripodi/ > http://www.99soft.org/ > > > > 2011/4/5 Weiwei Wang : > > I don't not think your program becomes slower because you are not using > > Digester, RAM should be much faster. Suggest you make your main part of > your > > program simple and paste it in the email so as others can help > > > > On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco < > patrick.divia...@gmail.com > >> wrote: > > > >> hi, > >> > >> I've a java app and I've stopped to use Digester recently because all my > >> data is now kept in RAM and I don't need to write/parse xml files > anymore. > >> > >> However, since I don't use Digester and external xml files, the > performance > >> of my app got worse. > >> > >> I now have the same data stored in a ArrayList> and > I'm > >> iterate them with a for cycle. > >> > >> Before they were in a xml file with the following structure: > >> > >> > >> > >> > >> .. > >> > >> .. > >> > >> > >> Is really Digester much faster in iterating my data from xml file than a > >> for > >> loop iterating an ArrayList with the same content? > >> > >> thanks > >> > > > > > > > > -- > > 王巍巍 > > Cell: 18911288489 > > MSN: ww.wang...@gmail.com > > Blog: http://whisper.eyesay.org > > 围脖:http://t.sina.com/lolorosa > > > > - > To unsubscribe, e-mail: user-unsubscr...@commons.apache.org > For additional commands, e-mail: user-h...@commons.apache.org > >
Re: [digester] digester performance..
Hi Patrick, if the Digester program you're speaking about is the one you pasted here time ago... well, there were a lot of optimization missed. For example I suggested you using the Lucene rules instead of storing all the properties in a POJO then creating the Lucene Document, in that way you limit the amount of stored data. When parsing large XML document - like your case - I suggest you mapping to Object as less as possible and stream more. HTH, Simo http://people.apache.org/~simonetripodi/ http://www.99soft.org/ 2011/4/5 Weiwei Wang : > I don't not think your program becomes slower because you are not using > Digester, RAM should be much faster. Suggest you make your main part of your > program simple and paste it in the email so as others can help > > On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco > wrote: > >> hi, >> >> I've a java app and I've stopped to use Digester recently because all my >> data is now kept in RAM and I don't need to write/parse xml files anymore. >> >> However, since I don't use Digester and external xml files, the performance >> of my app got worse. >> >> I now have the same data stored in a ArrayList> and I'm >> iterate them with a for cycle. >> >> Before they were in a xml file with the following structure: >> >> >> >> >> .. >> >> .. >> >> >> Is really Digester much faster in iterating my data from xml file than a >> for >> loop iterating an ArrayList with the same content? >> >> thanks >> > > > > -- > 王巍巍 > Cell: 18911288489 > MSN: ww.wang...@gmail.com > Blog: http://whisper.eyesay.org > 围脖:http://t.sina.com/lolorosa > - To unsubscribe, e-mail: user-unsubscr...@commons.apache.org For additional commands, e-mail: user-h...@commons.apache.org
Re: [digester] digester performance..
I don't not think your program becomes slower because you are not using Digester, RAM should be much faster. Suggest you make your main part of your program simple and paste it in the email so as others can help On Tue, Apr 5, 2011 at 7:08 PM, Patrick Diviacco wrote: > hi, > > I've a java app and I've stopped to use Digester recently because all my > data is now kept in RAM and I don't need to write/parse xml files anymore. > > However, since I don't use Digester and external xml files, the performance > of my app got worse. > > I now have the same data stored in a ArrayList> and I'm > iterate them with a for cycle. > > Before they were in a xml file with the following structure: > > > > > .. > > .. > > > Is really Digester much faster in iterating my data from xml file than a > for > loop iterating an ArrayList with the same content? > > thanks > -- 王巍巍 Cell: 18911288489 MSN: ww.wang...@gmail.com Blog: http://whisper.eyesay.org 围脖:http://t.sina.com/lolorosa
[digester] digester performance..
hi, I've a java app and I've stopped to use Digester recently because all my data is now kept in RAM and I don't need to write/parse xml files anymore. However, since I don't use Digester and external xml files, the performance of my app got worse. I now have the same data stored in a ArrayList> and I'm iterate them with a for cycle. Before they were in a xml file with the following structure: .. .. Is really Digester much faster in iterating my data from xml file than a for loop iterating an ArrayList with the same content? thanks