Parallel SQL Interface and 'qt'
Hi, I've just started to look into the Parallel SQL interface available in SOLR. I've done some tests across a few collections, and it works fairly well. However I've run into an issue with a few collections where the SQL interface does not return any data. Now according to the documentation, it seems like it relies on using the default /select handler to lookup and return data. However the collections that do not return data in the SQL interface, are all using /select handlers that have some logic that requires additional parameters to return data. I know that if you define 'handleSelect' to true in the /select handler, you can pass the 'qt' parameter to define which handler to use on the fly. The handler in question is configured to use 'handleSelect', however when I pass the 'qt' parameter to the /sql handler, it does not seem to work. I've gone through the documentation, however I can't find any information in this regard. Is it possible to define which handler to use when using the /sql handler? -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson https://tolecnal.net <http://tolecnal.net> -- tolecnal at tolecnal dot net
Re: Get transaction count from ZooKeeper transaction logs
On Wed, 11 Jul 2018 at 02:48, Shawn Heisey wrote: > On 7/10/2018 3:32 PM, Jostein Elvaker Haande wrote: > > I'm trying to find an effective way to find the number of transactions > You have detailed questions about the inner workings of ZooKeeper. This > is not a ZooKeeper mailing list. It is a Solr mailing list. The > ZooKeeper project has its own support resources. That project is going > to be in a far better position to have access to the information that > you need. > Thanks Shawn, I've redirect my question to this mailing list. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Get transaction count from ZooKeeper transaction logs
Hello, I'm trying to find an effective way to find the number of transactions stored in the ZooKeeper transaction logs. The only method I've found so far is by using the Java class 'org.apache.zookeeper.server.LogFormatter' which outputs the following after it has formatted a log file: EOF reached after 203 txns. Now I could of course make a script to process each log file through this log formatter, and extract the count from the last line of stdout, but I'm wondering if there's an easier method. I've read through the ZK documentation, and tried the ZK commands (aka The Four Letter Words) to see if any of these offer this metric, but I could not see it. So my question is - is there a simpler approach to find this count? -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 17:46, Erick Erickson wrote: > through a clever bit of reflection, you can set the > reclaimDeletesWeight variable from solrconfig by including something > like > 5 (going from memory > here, you'll get an error on startup if I've messed it up.) I added the following to my solrconfig a couple of days ago: 8 8 5.0 There has been several commits and the core is current according to SOLR admin, however I'm still seeing a lot of deleted docs. These are my current core statistics. Last Modified:4 minutes ago Num Docs:1 675 255 Max Doc:2 353 476 Heap Memory Usage:208 464 267 Deleted Docs:678 221 Version:1 870 539 Segment Count:39 Index size is close to 149GB. So at the moment, I'm seeing a deleted docs to max docs percentage ratio of 28.81%. With 'reclaimsWeight' set to 5, it doesn't seem to be deleting away any deleted docs. Anything obvious I'm missing? -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 12:25, Markus Jelsma wrote: > Hello - with TieredMergePolicy and default reclaimDeletesWeight of 2.0, and > frequent updates, it is not uncommon to see a ratio of 25%. If you want > deletes to be reclaimed more often, e.g. weight of 4.0, you will see very > frequent merging of large segments, killing performance if you are on > spinning disks. Most of our installations are on spinning disks, so if I want a more aggressive reclaim, this will impact performance. This is of course something that I do not desire, so I'm wondering if scheduling a commit with 'expungeDeletes' during off peak business hours is a better approach than setting up a more aggressive merge policy. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Re: Deleted documents and expungeDeletes
On 30 March 2016 at 02:49, Erick Erickson wrote: > Please specify "growing and growing", Until it gets to 15% or more of the > total > then I'd start to worry. And then only if it kept growing after that. I tested 'expungeDeletes' on four different cores, three of them were nearly identical in terms of numbers. Max Docs were around ~2.2M, Num Docs was ~1.6M and Deleted Docs were ~600K - so the percentage of Deleted Docs were around the ~27 percent mark. So according to your feedback, I should start to worry! Now the question is, why aren't the Deleted Docs being merged away if this is in fact supposed to happen? > 1> This is automatic. It'll "just happen", but you will probably always carry > some deleted docs around in your index. Yeah, that I am aware of - I noticed that even after running 'expungeDeletes' I had a few thousand docs left, which is acceptable and does not worry me. > 4> True, but usually the effect is so minuscule that nobody notices. > People spend > endless time obsessing about this and unless and until you can show that your > _users_ notice, I'd ignore it. Hehe, then I'll refrain from being one of those that obsess over this. As long as I know the effect it has is minuscule, then I'll just toss the thought in the bin. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net
Deleted documents and expungeDeletes
Hello everyone, I apologise beforehand if this is a question that has been visited numerous times on this list, but after hours spent on Google and talking to SOLR savvy people on #solr @ Freenode I'm still a bit at a loss about SOLR and deleted documents. I have quite a few indexes in both production and development environments, where I see that the number of deleted documents just keeps on growing and growing, but they never seem to be deleted. From my understanding, this can be controller in the merge policy set for the current core, but I've not been able to find any specifics on the topic. The general consensus on most search hits I've found is to perform an optimize of the core, however this is both an expensive operation, both in terms of CPU cycles as well as disk I/O, and also requires you to have anywhere from 2 times to 3 times the size of the index available on disk to be guaranteed to complete fully. Given these criteria, it's often not something that is a viable option in certain environments, both to it being a resource hog and often that you just don't have the needed available disk space to perform the optimize. After having spoken with a couple of people on IRC (thanks tokee and elyograg), I was made aware of an optional parameter for called 'expungeDeletes' that can explicitly make sure that deleted documents are deleted from the index, i.e: curl http://localhost:8983/solr/coreName/update -H "Content-Type: text/xml" --data-binary '' Now my questions are as follows: 1) How can I make sure that this is dealt with in my merge policy, if at all possible? 2) I've tried to find some disk space guidelines for 'expungeDeletes', however I've not been able to find any. What are the general guidelines here? Does it require as much space as an optimize, or is it less "aggressive" compared to an optimize? 3) Is 'expungeDeletes' the recommended method to make sure your deleted documents are actually removed from the index, or should you deal with this in your merge policy? 4) I have also heard from talks on #SOLR that deleted documents has an impact on the relevancy of performed searches. Is this correct, or just misinformation? If you require any additional information, like snippets from my configuration (solrconfig.xml), I'm more than happy to provide this. Again, if this is an issue that's being revisited for the Nth time, I apologize, I'm just trying to get my head around this with my somewhat limited SOLR knowledge. -- Yours sincerely Jostein Elvaker Haande "A free society is a society where it is safe to be unpopular" - Adlai Stevenson http://tolecnal.net -- tolecnal at tolecnal dot net