How much slower than anticipated is it?

I would start by using a StringBuffer/Builder rather than appending
(immutable) strings to each other.


11 aug 2007 kl. 19.05 skrev John Paul Sondag:

Hi,

I was hoping that maybe you guys could see if I'm somehow indexing
inefficiently. I'm putting relevant parts of my code below. I've looked at the "benchmarks" page on Lucene and my indexing time is taking a substantial amount of time more than what I see posted. I'm not sure when I should call flush() ( I saw that I should be doing that on the ImproveIndexingSpeed
page).  I'd really appreciate any advice.

Here's my code:

File directory = new File( "/mounts/falcon5/disks/0/tcheng3/ Dataset");
      File[] theFiles = directory.listFiles();

          //go through each file inside the directory and index it
          for(int curFile = 0; curFile < theFiles.length; curFile++)
          {
              File fin=theFiles[curFile];

              //open up the file
              FileInputStream inf = new FileInputStream(fin);
              InputStreamReader isr = new InputStreamReader(inf,
"US-ASCII");
              BufferedReader in = new BufferedReader(isr);
              String text="";
              String docid="";

            while (true) {

            //read in the file one line at a time, and act accordingly
                String line = in.readLine();
                if (line == null) { break;}

                 if (line.startsWith("<DOC>") ) {
                    //get docID
                    line = in.readLine();
                    String tempStr = line.substring(8,line.length());
                    int pos = tempStr.indexOf(' ');
                    docid = tempStr.substring(0,pos);
                    }else if (line.startsWith("</DOC>")) {

                    Document doc = new Document();

doc.add(new Field("contents",text, Field.Store.NO,
Field.Index.TOKENIZED, Field.TermVector.WITH_POSITIONS ));
                    doc.add(new Field("DocID",docid, Field.Store.YES,
Field.Index.NO));
                      writer.addDocument(doc);
                    text="";
                } else {
                    text = text + "\n" + line;
                }
            }

        }


          int numIndexed = writer.docCount();

          writer.optimize();
          writer.close();


Thanks,

--JP


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to