Hi Steve, thanks for your reply a lot.its now compress upto 50% of the original size.is there any other possiblity using this code compress upto 80%.
Steve Liles wrote: > > Compression aside you could index the "contents" as terms in separate > fields instead of tokenized text, and disable storing of norms: > > String outgoingNumber="9198408365809"; > String incomingNumber="9840861114"; > > _doc.add(new Field("outgoingNumber", outgoingNumber, Store.NO, > Index.NO_NORMS)); > _doc.add(new Field("incomingNumber", incomingNumber, Store.NO, > Index.NO_NORMS)); > > According to the docs "Index.NO_NORMS" will save you one byte per > document in the index. > > Or you could index all of the data as separate terms in the same > "contents" field if you wanted (make the first param "contents" for all > of the terms), which is more comparable to what you are currently doing. > (Another advantage is that the Analyzer will not be used for fields > which are untokenized, and indexing should be faster.) > > ... > > One way to compress numerical data (possibly not the best - i'm no > expert) is to change the base of the number that is indexed / stored in > the index. > > java.lang.Long and java.math.BigInteger have methods for converting from > one radix to another. Taking your "outgoingNumber" as an example: > > //compression > BigInteger _bi = new java.math.BigInteger("9198408365809", 10); > System.out.println(_bi.toString(36)); > > > 39douufap > > //decompression > BigInteger _bi = new java.math.BigInteger("39douufap", 36); > System.out.println(_bi.toString(10)); > > >9198408365809 > > Converting to a higher radix will give you better compression but you'll > have to do it yourself as the jdk classes only work up to base 36 > <http://en.wikipedia.org/wiki/Base_36>. > > It's worth compressing your unstored "contents" field as well as your > stored "records" field, as the unique terms in the "contents" field will > effectively be stored. > > Also don't forget to convert the terms when you search too, otherwise > you won't find anything ;) > > Steve. > > > Sebastin wrote: >> When i use the standardAnalyzer storage size increases.how can i minimize >> index store >> >> Sebastin wrote: >> >>> >>> String outgoingNumber="9198408365809"; >>> String incomingNumber="9840861114"; >>> String datesc="070601"; >>> String imsiNumber="444021365987"; >>> String callType="1"; >>> >>> //Search Fields >>> String contents=(outgoingNumber+" "+incomingNumber+" "+dateSc+" >>> "+imsiNumber+" "+callType ); >>> >>> //Display Fields >>> >>> String records=(callingPartyNumber+" >>> "+calledPartyNumber+" "+dateSc+" "+chargDur+" "+incomingRoute+" >>> "+outgoingRoute+" "+timeSc); >>> >>> >>> IndexWriter indexWriter = new >>> IndexWriter(indexDir,new StandardAnalyzer(),true); >>> >>> Document document = new Document(); >>> >>> document.add(new >>> Field("contents",contents,Field.Store.NO,Field.Index.TOKENIZED)); >>> >>> >>> >>> document.add(new >>> Field("records",records,Field.Store.YES,Field.Index.NO)); >>> >>> >>> indexWriter.setUseCompoundFile(true); >>> indexWriter.addDocument(document); >>> } >>> >>> please help me to acheive the minimum size >>> >>> >>> >>> >>> >>> Erick Erickson wrote: >>> >>>> Show us the code you use to index. Are you storing the fields? >>>> omitting norms? Throwing out stop words? >>>> >>>> Best >>>> Erick >>>> >>>> On 6/19/07, Sebastin <[EMAIL PROTECTED]> wrote: >>>> >>>>> Hi Does anyone give me an idea to reduce the Index size to down.now i >>>>> am >>>>> getting 42% compression in my index store.i want to reduce upto 70%.i >>>>> use >>>>> standardanalyzer to write the document.when i use SimpleAnalyzer it >>>>> reduce >>>>> upto 58% but i couldnt search the document.please help me to acheive. >>>>> >>>>> Thanks in advance >>>>> >>>>> Jeff-188 wrote: >>>>> >>>>>>> I found that reducing my index from 8G to 4G (through not stemming) >>>>>>> >>>>> gave >>>>> me >>>>> >>>>>> about a 10% performance improvement. >>>>>> >>>>>> How did you do this? I don't see this as an option. >>>>>> >>>>>> Jeff >>>>>> >>>>>> >>>>>> >>>>> -- >>>>> View this message in context: >>>>> http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11195406 >>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>>>> >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: [EMAIL PROTECTED] >>>>> For additional commands, e-mail: [EMAIL PROTECTED] >>>>> >>>>> >>>>> >>>> >>> >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/ways-to-minimize-index-size--tf3401213.html#a11249562 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]