[PD] search plugin update (was: Re: reverse kickstarter update)
Hi list, Attached is a first pass at using the Xapian backend to search Pure Data docs. What the revision does: * simplifies building a search index. It builds once, on the first search, and all subsequent searches happen very fast. Previously it searched the docs themselves every single time and depended on the OS caching the data, resulting in sluggish performance especially on Windows. * natural language, probabalistic searches. The search terms in the index were automatically chosen by the engine with no customization, and already the results are decent. * nearly no input errors. Xapian has its own simple syntax, but for most cases users can ignore it and type in natural language searches (like Google). And the few errors the user can generate have meaningful feedback to the console. Also, since I'm passing the input as a string you don't have to worry about malformed tcl lists or weird characters that previously caused error. * everything, including pd files, pdfs and html, is indexed properly and so will get included in the results in the proper place. * gives the ability to add results from a remote database with a couple lines of code. * allows the removal of Match all terms and Match whole words checkbuttons, simplifying the interface. * performs stemming out of the box-- that is, searching for edit, the engine will take into account editing, edits, edited, etc. Installation for linux (Debian): 1) Make sure you have libxapian and tclxapian packages installed. Other distros probably have corresponding packages. 2) put search-plugin.tcl in the /startup directory, or if you're using Pd vanilla just make sure it's in a directory that's specified in the Path dialog. 3) Run Pd and click ctrl-h or choose Search from the Help menu. Further work that needs to be done: * need to figure out where to create the database directory on Linux, OSX, and Windows. The directory needs to be read/writable. Is there an easy way to do this? * need a Cancel button next to the progressbar when indexing, so the user can cancel a long index. Further work that could be done: * add pd meta tag/values to the index terms for each document. This would make it possible to type keyword:foo or author:bar to search based solely on that pd meta tag/value. * add filenames to terms * add object terms so the user can search pd patches for a particular object instance, i.e., object:clip * limit the document data in the database to pd meta tags/values and other metadata. Right now I'm storing the _entire_ doc text in the database which obviously wastes space. * xapian has all kinds of features, like suggesting related searches, and realtime results. The latter could be very handy for autocompletion in object boxes, for example. * could use the title of html files as description for better result descriptions * could plug in to puredata.info to search for externals, plugins, etc. As always, feedback welcome. And feel free to donate some rice and beans if you can! https://jwilkes.nfshost.com/donations.php Best, Jonathan # browse docs or search all the documentation using a regexp # check the Help menu for the Browser item to use it # todo: use xapian syntax for meta keywords #keyword:foo # todo: when cancelling a db index build, we need to remove # the database completely # todo: remove both checkbuttons-- not needed # todo: do newline regsub and document parsing on indexing # todo: make libdir listing check for duplicates # todo: hook into the dialog_bindings # TODO remove the doc_ prefix on procs where its not needed # TODO enter and up/down/left/right arrow key bindings for nav # redesign: # [ search entry ] Help # [search] [filter] # package require Tk 8.5 package require pd_bindings package require pd_menucommands package require xapian 1.0.0 namespace eval ::dialog_helpbrowser2:: { variable doctypes *.{pd,pat,mxb,mxt,help,txt,htm,html,pdf} variable searchfont [list {DejaVu Sans}] variable searchtext {} variable search_history {} variable count {} # $i controls the build_index recursive loop variable i variable filelist {} variable progress {} variable navbar {} variable genres variable cancelled variable database {} } ## help browser and support functions # proc ::dialog_helpbrowser2::open_helpbrowser {mytoplevel} { if {[winfo exists $mytoplevel]} { wm deiconify $mytoplevel raise $mytoplevel } else { create_dialog $mytoplevel } } proc ::dialog_helpbrowser2::create_dialog {mytoplevel} { variable searchfont variable selected_file variable genres [list [_ All documents] \ [_ Object Help Patches] \ [_ All About Pd] \ [_ Tutorials] \ [_ Manual] \ [_ Uncategorized] \ ] variable count foreach genre $genres { lappend
Re: [PD] search plugin update (was: Re: reverse kickstarter update)
On Sep 15, 2013, at 3:23 PM, pd-list-requ...@iem.at wrote: * need to figure out where to create the database directory on Linux, OSX, and Windows. The directory needs to be read/writable. Is there an easy way to do this? For Linux Windows, why not put it in the same location as the pd settings file? On OSX, I'd put it in ~/Library/Application Support/pd (or pd-extended). Dan Wilcox @danomatika danomatika.com robotcowboy.com ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] Search plugin update
This definitely sounds quite useful. Scrolling to the selection is not something easy to do right now, but its something that could be made easy to do. Basically, if the selection is tagged with a tag that marks it as the selection, then it would be easy to find the selection object's location and scroll to it all in Tcl. If you grep the pd-extended source for select_color you'll find the spots that need to be changed. I think the trickier bit might be removing the tag once the selection is done. This a patch I'd accept and would lobby to have Miller include it also. This is a behavior that Pd's find panel should also have. .hc On 01/21/2013 01:04 AM, Jonathan Wilkes wrote: I updated the homepage of the search plugin to point to a pd glossary that I wrote awhile back and forgot about. It's kind of neat-- you can add entries to the text file in doc/5.reference/glossary.txt and doc/5.reference/glossary.pd will parse the file, sort the entries in alphabetical order, and display them in the patch with links to objects related to the terms. They probably need some work so feel free to make/ suggest changes. Unfortunately ctrl-f Find won't scroll to the relevant part of a long patch if the match happens to be out of view. Is there a way to fix this? -Jonathan ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
[PD] Search plugin update
I updated the homepage of the search plugin to point to a pd glossary that I wrote awhile back and forgot about. It's kind of neat-- you can add entries to the text file in doc/5.reference/glossary.txt and doc/5.reference/glossary.pd will parse the file, sort the entries in alphabetical order, and display them in the patch with links to objects related to the terms. They probably need some work so feel free to make/ suggest changes. Unfortunately ctrl-f Find won't scroll to the relevant part of a long patch if the match happens to be out of view. Is there a way to fix this? -Jonathan ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] search plugin update
0 0 is problematic on couple platforms. On Mac OS X, the menubar is always there, so it puts the window header behind on menubar. A similar problem happens on GNOME. .hc On Aug 25, 2011, at 5:33 PM, Jonathan Wilkes wrote: Ok, fixed the weird resizing issue when the text in the status area is larger than the window. Fixed search window to appear at 0 0 on when it's first created. Fixed font sizing bindings. Fixed minimum font size. -Jonathan - Original Message - From: Hans-Christoph Steiner h...@at.or.at To: Mathieu Bouchard ma...@artengine.ca Cc: Jonathan Wilkes jancs...@yahoo.com; pd-list List pd-list@iem.at Sent: Sunday, August 7, 2011 5:38 PM Subject: Re: [PD] search plugin update On Aug 7, 2011, at 2:51 PM, Mathieu Bouchard wrote: On Sat, 6 Aug 2011, Hans-Christoph Steiner wrote: - on Mac OS X Cmd-Shift-= (i.e. Cmd-+) is the standard key for increasing the size of the text. Currently, its Cmd-=. It will break on keyboard layouts that are not QWERTY or that are heavily modified QWERTY. When I designed some things in the default DD keyboard bindings, I only had US keyboard and CF-family keyboards in mind (french QWERTY used in Québec) and then someone notified me that I couldn't distinguish Alt+Shift+1 from Alt+1 because 1 is already shifted in AZERTY (it's Shift-, whereas is not shifted). German QWERTZ has = on Shift+0 and * on Shift++, meaning + is unshifted ; however, Swiss QWERTZ has + shifted as Shift+1, and then there are other QWERTZ than that... It'd be something to test, Cmd-+ might work as a keybinding, and would then work on other keyboards. Or perhaps you can just bind to both Cmd- Shift-+ and Cmd-+. For other platforms, its not a big deal since the keybindings are not very consistent. On Mac OS X, they are quite consistent across OS and apps, so people notice wrong bindings a lot more. .hc “We must become the change we want to see. - Mahatma Gandhi search-plugin.tcl All mankind is of one author, and is one volume; when one man dies, one chapter is not torn out of the book, but translated into a better language; and every chapter must be so translated -John Donne ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] search plugin update
Ok, fixed the weird resizing issue when the text in the status area is larger than the window. Fixed search window to appear at 0 0 on when it's first created. Fixed font sizing bindings. Fixed minimum font size. -Jonathan - Original Message - From: Hans-Christoph Steiner h...@at.or.at To: Mathieu Bouchard ma...@artengine.ca Cc: Jonathan Wilkes jancs...@yahoo.com; pd-list List pd-list@iem.at Sent: Sunday, August 7, 2011 5:38 PM Subject: Re: [PD] search plugin update On Aug 7, 2011, at 2:51 PM, Mathieu Bouchard wrote: On Sat, 6 Aug 2011, Hans-Christoph Steiner wrote: - on Mac OS X Cmd-Shift-= (i.e. Cmd-+) is the standard key for increasing the size of the text. Currently, its Cmd-=. It will break on keyboard layouts that are not QWERTY or that are heavily modified QWERTY. When I designed some things in the default DD keyboard bindings, I only had US keyboard and CF-family keyboards in mind (french QWERTY used in Québec) and then someone notified me that I couldn't distinguish Alt+Shift+1 from Alt+1 because 1 is already shifted in AZERTY (it's Shift-, whereas is not shifted). German QWERTZ has = on Shift+0 and * on Shift++, meaning + is unshifted ; however, Swiss QWERTZ has + shifted as Shift+1, and then there are other QWERTZ than that... It'd be something to test, Cmd-+ might work as a keybinding, and would then work on other keyboards. Or perhaps you can just bind to both Cmd-Shift-+ and Cmd-+. For other platforms, its not a big deal since the keybindings are not very consistent. On Mac OS X, they are quite consistent across OS and apps, so people notice wrong bindings a lot more. .hc “We must become the change we want to see. - Mahatma Gandhi # plugin to allow searching all the documentation using a regexp # check the Help menu for the Search item to use it # Bugs: # tiny text in combobox dropdown menu on Windows # can't interrupt long searches on Windows (never get them in Fedora 15) # Todo: # try to clean up user input prevent regex error messages package require Tk 8.5 package require pd_bindings package require pd_menucommands namespace eval ::dialog_search:: { variable searchtext {} variable search_history {} variable count {} variable genres [list [_ All documents] \ [_ Object Help Patches] \ [_ All About Pd] \ [_ Tutorials] \ [_ Manual] \ [_ Uncategorized] ] } # TODO check line formatting options # find_doc_files # basedir - the directory to start looking in proc ::dialog_search::find_doc_files { basedir } { # Fix the directory name, this ensures the directory name is in the # native format for the platform and contains a final directory seperator set basedir [string trimright [file join $basedir { }]] set fileList {} # Look in the current directory for matching files, -type {f r} # means ony readable normal files are looked at, -nocomplain stops # an error being thrown if the returned list is empty foreach fileName [glob -nocomplain -type {f r} -path $basedir $helpbrowser::doctypes] { lappend fileList $fileName } # Now look for any sub direcories in the current directory foreach dirName [glob -nocomplain -type {d r} -path $basedir *] { # Recusively call the routine on the sub directory # (if it's not already in Pd's search path) and # append any new files to the results set nomatch [lsearch [concat [file join $::sys_libdir doc] $::sys_searchpath $::sys_staticpath] $dirName] if { $nomatch eq -1 } { set subDirList [find_doc_files $dirName] if { [llength $subDirList] 0 } { foreach subDirFile $subDirList { lappend fileList $subDirFile } } } } return $fileList } # TODO: break up into: l proc ::dialog_search::open_file { xpos ypos mytoplevel clicked } { set textwidget $mytoplevel.resultstext set i [$textwidget index @$xpos,$ypos] set range [$textwidget tag nextrange filename $i] set filename [eval $textwidget get $range] set range [$textwidget tag nextrange basedir $i] set basedir [eval $textwidget get $range] append basedir / if {$clicked eq 1} { if {$filename ne } { menu_doc_open $basedir $filename } } else { $mytoplevel.statusbar configure -text Open $basedir$filename } } # only does keywords for now-- maybe expand this to handle any meta tags proc ::dialog_search::grab_metavalue { xpos ypos mytoplevel clicked } { set textwidget $mytoplevel.resultstext #set xpos_offset 20 #set xpos [expr {$xpos + $xpos_offset}] set i [$textwidget index @$xpos,$ypos] set range [$textwidget tag nextrange metavalue_h $i] set value [eval $textwidget get $range] set text {keywords.*} append text $value if {$clicked eq 1} { set
Re: [PD] search plugin update
On Sat, 6 Aug 2011, Hans-Christoph Steiner wrote: - on Mac OS X Cmd-Shift-= (i.e. Cmd-+) is the standard key for increasing the size of the text. Currently, its Cmd-=. It will break on keyboard layouts that are not QWERTY or that are heavily modified QWERTY. When I designed some things in the default DD keyboard bindings, I only had US keyboard and CF-family keyboards in mind (french QWERTY used in Québec) and then someone notified me that I couldn't distinguish Alt+Shift+1 from Alt+1 because 1 is already shifted in AZERTY (it's Shift-, whereas is not shifted). German QWERTZ has = on Shift+0 and * on Shift++, meaning + is unshifted ; however, Swiss QWERTZ has + shifted as Shift+1, and then there are other QWERTZ than that... ___ | Mathieu Bouchard tél: +1.514.383.3801 Villeray, Montréal, QC ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] search plugin update
On Aug 7, 2011, at 2:51 PM, Mathieu Bouchard wrote: On Sat, 6 Aug 2011, Hans-Christoph Steiner wrote: - on Mac OS X Cmd-Shift-= (i.e. Cmd-+) is the standard key for increasing the size of the text. Currently, its Cmd-=. It will break on keyboard layouts that are not QWERTY or that are heavily modified QWERTY. When I designed some things in the default DD keyboard bindings, I only had US keyboard and CF-family keyboards in mind (french QWERTY used in Québec) and then someone notified me that I couldn't distinguish Alt +Shift+1 from Alt+1 because 1 is already shifted in AZERTY (it's Shift-, whereas is not shifted). German QWERTZ has = on Shift+0 and * on Shift++, meaning + is unshifted ; however, Swiss QWERTZ has + shifted as Shift+1, and then there are other QWERTZ than that... It'd be something to test, Cmd-+ might work as a keybinding, and would then work on other keyboards. Or perhaps you can just bind to both Cmd-Shift-+ and Cmd-+. For other platforms, its not a big deal since the keybindings are not very consistent. On Mac OS X, they are quite consistent across OS and apps, so people notice wrong bindings a lot more. .hc “We must become the change we want to see. - Mahatma Gandhi ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
Re: [PD] search plugin update
Its definitely getting quite polished. Two little details on Mac OS X that are odd: - on Mac OS X Cmd-Shift-= (i.e. Cmd-+) is the standard key for increasing the size of the text. Currently, its Cmd-=. - on Mac OS X, when I mouse over the object name in the search results, the width of the whole window jumps, because the full path displayed on the bottom of the window is a lot longer then the normal window size would display. .hc On Aug 4, 2011, at 5:49 PM, Jonathan Wilkes wrote: Search plugin revision: * added status bar shows link locations and search text (if searching for a keyword tag) * quoted text works, e.g., it's a secret to everybody * regexes seem to work, e.g., outlet.*symbol will match all objects that output a symbol * pd-style word boundaries, e.g., clip~ works when whole words option is checked * font +/- with ctrl-plus/ctrl-minus keys -Jonathan search-plugin.tcl ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list If you are not part of the solution, you are part of the problem. ___ Pd-list@iem.at mailing list UNSUBSCRIBE and account-management - http://lists.puredata.info/listinfo/pd-list
[PD] search plugin update
Search plugin revision: *added status bar shows link locations and search text (if searching for a keyword tag) * quoted text works, e.g., it's a secret to everybody * regexes seem to work, e.g., outlet.*symbol will match all objects that output a symbol * pd-style word boundaries, e.g., clip~ works when whole words option is checked * font +/- with ctrl-plus/ctrl-minus keys -Jonathan # plugin to allow searching all the documentation using a regexp # check the Help menu for the Search item to use it # Bugs: # tiny text in combobox dropdown menu on Windows # can't interrupt long searches on Windows (never get them in Fedora 15) # Todo: # try to clean up user input prevent regex error messages package require Tk 8.5 package require pd_bindings package require pd_menucommands namespace eval ::dialog_search:: { variable searchtext {} variable search_history {} variable count {} variable genres [list [_ All documents] \ [_ Object Help Patches] \ [_ All About Pd] \ [_ Tutorials] \ [_ Manual] \ [_ Uncategorized] ] } # TODO check line formatting options # find_doc_files # basedir - the directory to start looking in proc ::dialog_search::find_doc_files { basedir } { # Fix the directory name, this ensures the directory name is in the # native format for the platform and contains a final directory seperator set basedir [string trimright [file join $basedir { }]] set fileList {} # Look in the current directory for matching files, -type {f r} # means ony readable normal files are looked at, -nocomplain stops # an error being thrown if the returned list is empty foreach fileName [glob -nocomplain -type {f r} -path $basedir $helpbrowser::doctypes] { lappend fileList $fileName } # Now look for any sub direcories in the current directory foreach dirName [glob -nocomplain -type {d r} -path $basedir *] { # Recusively call the routine on the sub directory # (if it's not already in Pd's search path) and # append any new files to the results set nomatch [lsearch [concat [file join $::sys_libdir doc] $::sys_searchpath $::sys_staticpath] $dirName] if { $nomatch eq -1 } { set subDirList [find_doc_files $dirName] if { [llength $subDirList] 0 } { foreach subDirFile $subDirList { lappend fileList $subDirFile } } } } return $fileList } # TODO: break up into: l proc ::dialog_search::open_file { xpos ypos mytoplevel clicked } { set textwidget $mytoplevel.resultstext set i [$textwidget index @$xpos,$ypos] set range [$textwidget tag nextrange filename $i] set filename [eval $textwidget get $range] set range [$textwidget tag nextrange basedir $i] set basedir [eval $textwidget get $range] append basedir / if {$clicked eq 1} { if {$filename ne } { menu_doc_open $basedir $filename } } else { $mytoplevel.statusbar configure -text Open $basedir$filename } } # only does keywords for now-- maybe expand this to handle any meta tags proc ::dialog_search::grab_metavalue { xpos ypos mytoplevel clicked } { set textwidget $mytoplevel.resultstext #set xpos_offset 20 #set xpos [expr {$xpos + $xpos_offset}] set i [$textwidget index @$xpos,$ypos] set range [$textwidget tag nextrange metavalue_h $i] set value [eval $textwidget get $range] set text {keywords.*} append text $value if {$clicked eq 1} { set ::dialog_search::searchtext set ::dialog_search::searchtext $text ::dialog_search::search } else { $mytoplevel.statusbar configure -text $text } } # show/hide results based on genre proc ::dialog_search::filter_results { combobox text } { variable genres set elide {} if { [$combobox current] eq 0 } { foreach genre $genres { $text tag configure [join $genre _] -elide off set tag [join $genre _] append tag _count $text tag configure $tag -elide on } set tag [join [lindex $genres 0] _] append tag _count $text tag configure $tag -elide off } else { foreach genre $genres { if { [$combobox get] ne $genre } { $text tag configure [join $genre _] -elide on set tag [join $genre _] append tag _count $text tag configure $tag -elide on } else { $text tag configure [join $genre _] -elide off set tag [join $genre _] append tag _count $text tag configure $tag -elide off } } } $combobox selection clear focus $text } proc ::dialog_search::readfile {filename} { set fp [open $filename] set file_contents [split [read $fp] \n] close $fp return $file_contents } proc ::dialog_search::search { } { variable searchtext variable search_history if {$searchtext eq } return if { [lsearch $search_history $searchtext] eq -1 } { lappend