Re: Indexed Data Size
Brett, it’s probably because you hit the 5g default segment size limit on Solr and in order to merge segments a huge number of the docs within the segment must be marked as deleted. So even if large amounts of docs are deleted docs within the segment, the segment is still there, happily taking up space. That could theoretically be a reason for a optimize, but you’d want to specify maxsegments with the goal of not merging to a single segment for the entire index. Ideally you should just keep as many of the logs as you actually use (which is hopefully more limited than what you are keeping). Since the segments will be somewhat time based they would eventually disappear/merge through time, hopefully negating any reason to consider having to optimize Greg On Tue, Aug 13, 2019 at 3:31 PM Moyer, Brett wrote: > Turns out this is due to a job that indexes logs. We were able to clear > some with another job. We are working through the value of these indexed > logs. Thanks for all your help! > > Brett Moyer > Manager, Sr. Technical Lead | TFS Technology > Public Production Support > Digital Search & Discovery > > 8625 Andrew Carnegie Blvd | 4th floor > Charlotte, NC 28263 > Tel: 704.988.4508 > Fax: 704.988.4907 > bmo...@tiaa.org > > -Original Message- > From: Shawn Heisey > Sent: Friday, August 9, 2019 2:25 PM > To: solr-user@lucene.apache.org > Subject: Re: Indexed Data Size > > On 8/9/2019 12:17 PM, Moyer, Brett wrote: > > The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, > files with the extensions I stated previously. Each is 5gb and there are a > few hundred. Dated by to last 3 months. I don’t understand why there are so > many files with such small indexes. Not sure how to clean them up. > > Can you get a screenshot of the core overview for that particular core? > Solr should correctly calculate the size on the overview based on what > files are actually in the index directory. > > Thanks, > Shawn > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender > immediately and then delete it. > > TIAA > * >
RE: Indexed Data Size
Turns out this is due to a job that indexes logs. We were able to clear some with another job. We are working through the value of these indexed logs. Thanks for all your help! Brett Moyer Manager, Sr. Technical Lead | TFS Technology Public Production Support Digital Search & Discovery 8625 Andrew Carnegie Blvd | 4th floor Charlotte, NC 28263 Tel: 704.988.4508 Fax: 704.988.4907 bmo...@tiaa.org -Original Message- From: Shawn Heisey Sent: Friday, August 9, 2019 2:25 PM To: solr-user@lucene.apache.org Subject: Re: Indexed Data Size On 8/9/2019 12:17 PM, Moyer, Brett wrote: > The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files > with the extensions I stated previously. Each is 5gb and there are a few > hundred. Dated by to last 3 months. I don’t understand why there are so many > files with such small indexes. Not sure how to clean them up. Can you get a screenshot of the core overview for that particular core? Solr should correctly calculate the size on the overview based on what files are actually in the index directory. Thanks, Shawn * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
Re: Indexed Data Size
On 8/9/2019 12:17 PM, Moyer, Brett wrote: The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files with the extensions I stated previously. Each is 5gb and there are a few hundred. Dated by to last 3 months. I don’t understand why there are so many files with such small indexes. Not sure how to clean them up. Can you get a screenshot of the core overview for that particular core? Solr should correctly calculate the size on the overview based on what files are actually in the index directory. Thanks, Shawn
RE: Indexed Data Size
Correct our indexes are small document wise, but for some ready we have a years' worth of files in the data/solr folders. There are no index. files. The biggest is /data/solr/system_logs_shard1_replica_n1/data/index, files with the extensions I stated previously. Each is 5gb and there are a few hundred. Dated by to last 3 months. I don’t understand why there are so many files with such small indexes. Not sure how to clean them up. -Original Message- From: Shawn Heisey Sent: Friday, August 9, 2019 9:11 AM To: solr-user@lucene.apache.org Subject: Re: Indexed Data Size On 8/9/2019 6:12 AM, Moyer, Brett wrote: > Thanks! We update each index nightly, we don’t clear, but bring in New and > Deltas, delete expired/404. All our data are basically webpages, so none are > very large. Some PDFs but again not too large. We are running Solr 7.5, > hopefully you can access the links. Solr is saying that the entire size of the index directory is 95 MB for one of those indexes and the other is 30 MB. Those sound to me like very small indexes, not very large like you indicated. You were saying that the large files were in data/index, and did not mention anything about index. directories. If you do have a bunch of index. directories in the "Data" directory mentioned on the Core overview page, you can safely delete all of the index and/or index.* directories under that directory EXCEPT the one that is indicated as the "Index" directory. If you delete that one, you're deleting the actual live index ... and since you're not on Windows, the OS will let you delete it without complaining. The directory locations are cut off on both screenshots, so I can't confirm anything there. The larger core has about 2000 deleted docs and the smaller one has 40. Doing an optimize will not save much disk space or take very long. Thanks, Shawn * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
Re: Indexed Data Size
On 8/9/2019 6:12 AM, Moyer, Brett wrote: Thanks! We update each index nightly, we don’t clear, but bring in New and Deltas, delete expired/404. All our data are basically webpages, so none are very large. Some PDFs but again not too large. We are running Solr 7.5, hopefully you can access the links. Solr is saying that the entire size of the index directory is 95 MB for one of those indexes and the other is 30 MB. Those sound to me like very small indexes, not very large like you indicated. You were saying that the large files were in data/index, and did not mention anything about index. directories. If you do have a bunch of index. directories in the "Data" directory mentioned on the Core overview page, you can safely delete all of the index and/or index.* directories under that directory EXCEPT the one that is indicated as the "Index" directory. If you delete that one, you're deleting the actual live index ... and since you're not on Windows, the OS will let you delete it without complaining. The directory locations are cut off on both screenshots, so I can't confirm anything there. The larger core has about 2000 deleted docs and the smaller one has 40. Doing an optimize will not save much disk space or take very long. Thanks, Shawn
RE: Indexed Data Size
Thanks! We update each index nightly, we don’t clear, but bring in New and Deltas, delete expired/404. All our data are basically webpages, so none are very large. Some PDFs but again not too large. We are running Solr 7.5, hopefully you can access the links. https://www.dropbox.com/s/lzd6hkoikhagujs/CoreOne.png?dl=0 https://www.dropbox.com/s/ae6rayb38q39u9c/CoreTwo.png?dl=0 Brett -Original Message- From: Erick Erickson Sent: Thursday, August 8, 2019 5:49 PM To: solr-user@lucene.apache.org Subject: Re: Indexed Data Size On the surface, this makes no sense at all, so there’s something I don’t understand here ;). How often do you update your index? Having files from a long time ago is perfectly reasonable if you’re not updating regularly. But your statement that some of these are huge for just a 50K document index is odd unless they’re _huge_ documents. I wouldn’t optimize, unless you’re on Solr 7.5+ as that’ll create a single segment, see: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ and https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ The extensions you mentioned are perfectly reasonable. Each segment is made up of multiple files. .fdt for instance contains stored data. See: https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene62/package-summary.html Can you give us a long listing of one of your index directories? Best, Erick > On Aug 8, 2019, at 5:17 PM, Moyer, Brett wrote: > > In our data/solr//data/index on the filesystem, we have files > that go back 1 year. I don’t understand why and I doubt they are in use. > Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are > very large and running us out of server space. Our search indexes themselves > are not large, in total we might have 50k documents. How can I reduce this > /data/solr space? Is this what the Solr Optimize command is for? Thanks! > > Brett > > ** > *** This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > ** > *** * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *
Re: Indexed Data Size
On 8/8/2019 3:17 PM, Moyer, Brett wrote: In our data/solr//data/index on the filesystem, we have files that go back 1 year. I don’t understand why and I doubt they are in use. Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are very large and running us out of server space. Our search indexes themselves are not large, in total we might have 50k documents. How can I reduce this /data/solr space? Is this what the Solr Optimize command is for? Thanks! +1 to everything Erick said. Another piece of information that could be helpful is a screenshot of the core overview in the admin UI. It would look something like this: https://www.dropbox.com/s/mbh6ll1v8ghloko/solr-core-overview.png?dl=0 To get that, just go to the admin UI and choose one of the big cores from the core dropdown. That should put you on the overview tab for the core. Then grab a screenshot and use a file sharing site to share it. Thanks, Shawn
Re: Indexed Data Size
On the surface, this makes no sense at all, so there’s something I don’t understand here ;). How often do you update your index? Having files from a long time ago is perfectly reasonable if you’re not updating regularly. But your statement that some of these are huge for just a 50K document index is odd unless they’re _huge_ documents. I wouldn’t optimize, unless you’re on Solr 7.5+ as that’ll create a single segment, see: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/ and https://lucidworks.com/post/solr-and-optimizing-your-index-take-ii/ The extensions you mentioned are perfectly reasonable. Each segment is made up of multiple files. .fdt for instance contains stored data. See: https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/codecs/lucene62/package-summary.html Can you give us a long listing of one of your index directories? Best, Erick > On Aug 8, 2019, at 5:17 PM, Moyer, Brett wrote: > > In our data/solr//data/index on the filesystem, we have files > that go back 1 year. I don’t understand why and I doubt they are in use. > Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are > very large and running us out of server space. Our search indexes themselves > are not large, in total we might have 50k documents. How can I reduce this > /data/solr space? Is this what the Solr Optimize command is for? Thanks! > > Brett > > * > This e-mail may contain confidential or privileged information. > If you are not the intended recipient, please notify the sender immediately > and then delete it. > > TIAA > *
Indexed Data Size
In our data/solr//data/index on the filesystem, we have files that go back 1 year. I don’t understand why and I doubt they are in use. Files with extensions like fdx,cfe,doc,pos,tip,dvm etc. Some of these are very large and running us out of server space. Our search indexes themselves are not large, in total we might have 50k documents. How can I reduce this /data/solr space? Is this what the Solr Optimize command is for? Thanks! Brett * This e-mail may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and then delete it. TIAA *