Re: [CODE4LIB] code4lib mailing list
On Thu, Mar 24, 2016 at 10:39:03AM -0400, Ranti Junus wrote: > Mailman is easy to administer, but it has a huge caveat: when a user > request a password (reminder, etc.), it sends it as an email in plain text. Fortunately, list administrators can disable the monthly password reminder. Site admininistrators can disable them by default for all lists by setting DEFAULT_SEND_REMINDERS = No in mm_cfg.py. I agree that Mailman is the best open source mailing list software, and I doubt if there is a better proprietary solution. I could apply for a Mailman list from my university, but it would have to have the uni-bielefeld.de domain in its address; no custom domain possible. Has anybody asked JISC if they would host this list? They have accepted international mailing lists in the past provided that some British users are involved. Please do not use Google Groups – Google already tracks 78% of all website visits globally, so it's time to stop giving them data <https://timlibert.me/pdf/Libert-2015-Health_Privacy_on_Web.pdf>. Greetings from Europe! Christian -- Christian Pietsch · http://www.ub.uni-bielefeld.de/~cpietsch LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany signature.asc Description: PGP signature
Re: [CODE4LIB] Protocol-relative URLs in MARC
On Tue, Aug 18, 2015 at 09:29:17PM +1200, Stuart A. Yeates wrote: > While these may appear to be OAI-PMH providers, they're non-conformant: > > http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures > > OAI-PMH requests *must* be submitted using either the HTTP GET or POST > methods. Everything that holds for HTTP also holds for HTTPS because HTTPS is simply HTTP over TLS, as the HTTPS standard is aptly titled: https://tools.ietf.org/html/rfc2818 A discussion on the OAI implementers mailing list seemed to converge on the position to accept HTTPS wherever possible but not to require it. That was in 2005 when the IETF had not started to consider declaring HTTP without TLS obsolete altogether. https://www.openarchives.org/pipermail/oai-implementers/2005-February/001419.html > Maybe because forcing people to upgrade their tech leaves behind those with > the least resources. Maybe because switching to a protocol whose minimum > message cost (in cpu cycles) is many thousands of times higher is a dubious > cost/benefit trade-off in some situations. The burden of TLS encryption on CPUs is negligible these days: https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html C: -- Christian Pietsch · http://purl.org/net/pietsch LibTec (Library Technology and Knowledge Management) department of Bielefeld University Library, Bielefeld, Germany signature.asc Description: Digital signature
Re: [CODE4LIB] Protocol-relative URLs in MARC
Thank you, Andrew, for answering the question. What Stuart wrote, however, is misleading: On Tue, Aug 18, 2015 at 02:59:37PM +1200, Stuart A. Yeates wrote: > On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson wrote: > > > That said, there is a big push recently for dropping non-SSL connections > > in general (going so far as to call the protocol relative URIs an > > anti-pattern), so is it really worth all the potential pain and suffering > > to make your links scheme-agnostic, when maybe it would be a better > > investment in time to switch them all to SSL instead? This dovetails > > nicely with some of the discussions I have had recently with electronic > > services librarians about how to protect patron privacy in an online world > > by using SSL as an arrow in that quiver. > > > > Dropping non-SSL connections is almost certainly a mistake for two classes > reasons: > (i) a number of very widely used tools and standards (OAI-PMH, web > cacheing, monitoring, etc.) are HTTP-only Let me give you a counter example: Of 4810 OAI-PMH providers currently known to BASE <https://base-search.net>, 147 use a HTTPS base URL. Of the 3632 OAI-PMH sources BASE actively harvests at this time, 107 use HTTPS. > (ii) assumptions about the proportion of our users who have access > to a certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages > already disadvantaged groups of users, perpetuating the kind of > social ills that libraries are traditional held to be the cure of. I fail to see how continuing to use insecure, obsolete software is serving social justice. Excellent cryptographic software is available freely and openly. Cheers, Chris -- Christian Pietsch · http://purl.org/net/pietsch LibTec (Library Technology and Knowledge Management) department of Bielefeld University Library, Bielefeld, Germany signature.asc Description: Digital signature
Re: [CODE4LIB] XSLT stylesheet from MARC21 to RIS or BIBTEX
Hi Zeno, here is a way to do one part in XSLT and another using a C program: 1. Use an XSLT stylesheet to convert MARC21 to MODS <http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl>. 2. Then, use bibutils <http://sourceforge.net/projects/bibutils/> to convert MODS to your target formats. You will need: xml2bib - convert MODS into bibtex xml2ris - convert MODS into RIS format Or if you can use Perl, do the conversion in one go using Catmandu <https://librecatproject.wordpress.com/2014/12/11/day-9-processing-marc-with-catmandu/>. Cheers, Christian On Tue, Jul 14, 2015 at 10:27:20AM +0200, Tajoli Zeno wrote: > Hi to all, > > do you know XSLT stylesheet to do conversion from > MARC21 to RIS or from MARC21 to BIBTEX ? > > Bye > Zeno Tajoli -- Christian Pietsch · http://purl.org/net/pietsch LibTec (Library Technology and Knowledge Management) department of Bielefeld University Library, Bielefeld, Germany signature.asc Description: Digital signature
Re: [CODE4LIB] Very frustrated with Drupal
Hi Joshua, On Thu, May 15, 2014 at 09:47:06AM -0500, Joshua Welker wrote: > Thank you all for the responses. I hope my original email did not come off > as too abrasive. No worries, I find it a fair depiction, and I share your Drupal pain. > The issue for me is that I am having a hard time figuring out what exactly > is the use case for Drupal. Do you want a dead-simple website? Use > Wordpress. Do you want to add some complex custom apps? Use a framework. Do > you want the worst of both worlds? Use Drupal. Right. May I quote you on this? I prefer static site generators such as Jekyll for dead-simple websites and blogs. > If I get hit by a bus, not only will someone have > to relearn Drupal and all its modules, but they will also have to wade > through my spaghetti-code efforts at patching functionality into Drupal. After I decided to leave a project where I had developed a Drupal intranet site, my successor scrapped it and started from scratch using Owncloud. And I do not blame him. I would have preferred using something other than Drupal, too, but was not allowed to at the time. (In case you wonder how Drupal and Owncloud can fit the same purpose: The goal was to develop a Virtual Research Environment, and nobody knows for sure what this is supposed to be, so there is room for interpretation.) > Right now, my framework choices are narrowed down to Ruby on Rails, Laravel > (PHP), Django (Python), and Flask (Python). For anyone who has used these, > do you have any insight into how maintainable your projects are and how > easily they are managed/inherited by others? In my new role, I inherited some Flask applications, and I find maintaining, debugging and extending them pure joy. If you have to use Perl instead of Python, use Dancer instead. I also tried Django, but it I feel it forces me into a corset that is a odd re-interpretation (or misunderstanding) of the MVC model. Cheers, Christian -- Christian Pietsch http://purl.org/net/pietsch
Re: [CODE4LIB] Python in Your Library
Let's not start another discussion about programming languages, but for the record: On Wed, May 07, 2014 at 09:49:16AM -0400, Molly Des Jardin wrote: > This is not a complex point, but I've been using Python about since it came > out (since 2002) and have found it both flexible and very easy to learn and Python came out in 1991: https://en.wikipedia.org/wiki/Python_%28programming_language%29 > use. I'd recommend it as a scripting language for beginning programmers and > those experimenting around. Python was designed as a teaching language. Today, it is a universal programming language suitable for all tasks. Low-level (systems) and performance critical functions can be written in C if required. At Bielefeld University Library we use Python for rapidly prototyping Web services based on the Flask framework. These services have proven stable and efficient, so there has been no need to re-write them for production use. Examples include an online OAI-PMH validator <http://oval.base-search.net/>, a classifier for assinging DDC labels to English or German abstracts <http://clfapi.base-search.net/>, and an OAI-PMH interface for BASE, the Bielefeld Academic Search Engine. If you work closely with researchers, it might be relevant to you that Python has become the dominant language in scientific programming because of great progress in the NumPy/SciPy/pandas frameworks. The majority of our software is still developed in Perl – for historical reasons, and because there are exciting new frameworks in Perl such as the ETL framework Catmandu <http://librecat.org/>. Cheers Christian -- Christian Pietsch http://purl.org/net/pietsch
Re: [CODE4LIB] transforming marc to rdf
Hi Eric, you seem to have missed the Catmandu tutorial at SWIB13. Luckily there is a basic tutorial and a demo online: http://librecat.org/ The demo happens to be about transforming MARC to RDF using the Catmandu Perl framework. It gives you full flexibility by separating the importer from the exporter and providing a domain specific language for “fixing” the data in between. Catmandu also has easy to use wrappers for popular search engines and databases (both SQL and NoSQL), making it a complete ETL (extract, transform, load) toolkit. Disclosure: I am a Catmandu contributor. It's free and open source software. Cheers, Christian On Wed, Dec 04, 2013 at 09:59:46PM -0500, Eric Lease Morgan wrote: > Converting MARC to RDF has been more problematic. There are various > tools enabling me to convert my original MARC into MARCXML and/or > MODS. After that I can reportably use a few tools to convert to RDF: > > * MARC21slim2RDFDC.xsl [3] - functions, but even for > my tastes the resulting RDF is too vanilla. [4] > > * modsrdf.xsl [5] - optimal, but when I use my > transformation engine (Saxon), I do not get XML > but rather plain text > > * BIBFRAME Tools [6] - sports nice ontologies, but > the online tools won’t scale for large operations -- Christian Pietsch · http://www.ub.uni-bielefeld.de/~cpietsch/ LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany
Re: [CODE4LIB] We should use HTTPS on code4lib.org
On Mon, Nov 04, 2013 at 01:45:12PM -0500, Ethan Gruber wrote: > NSA broke it already Very funny but untrue. While it is certainly possible to create an insecure TLS certificate, for all we know it is not true that TLS has been broken in general. It is still one of the most usable protections against eavesdropping and is still recommended by independent encryption expert Bruce Schneier [1] and the Press Freedom Foundation [2], among others. In a recent recommendation, the BSI (the German federal office for IT security) recommended TLS 1.2 with Perfect Forward Security [3] for all HTTPS setups. I am sure it would be beneficial for code4lib.org, too. Cheers, Christian Bielefeld University Library, Germany References: [1] http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance [2] https://pressfreedomfoundation.org/encryption-works [3] http://www.pro-linux.de/news/1/20333/bsi-empfiehlt-tls-12-mit-perfect-forward-secrecy.html
Re: [CODE4LIB] Question for Institutional Repository Folks
Hi Matt, if you are certain that a PDF file was encumbered with DRM restrictions by mistake, then you can easily remove DRM using a tool from the free MuPDF software which is available for all major operating systems including Windows: http://www.mupdf.com/ If you have the current version, the command line goes like this: mutool clean old.pdf new.pdf Older versions of MyPDF included a different executable for this: pdfclean old.pdf new.pdf As for editing PDF files ... this is not what they are intended for, but it is possible with tools like PDFedit <http://pdfedit.cz>, Gimp, Inkscape or Scribus. Cheers, Christian On Mon, Oct 28, 2013 at 01:13:24PM -0400, Matthew Sherman wrote: > Can anyone give me some advice in how I can edit this to add the > required note to the top of the PDF? Any advice is welcome. -- Christian Pietsch http://purl.org/net/pietsch
Re: [CODE4LIB] pdf2txt [tesseract]
Hi Padraic, I have uploaded a shell script which happens to implement Robert Haschart's recipe: https://github.com/pietsch/Data-Munging/blob/master/ocr4pdf.sh Enjoy! Christian On Fri, Oct 18, 2013 at 10:22:17AM +0100, Padraic Stack wrote: > I would love to see that bash script if you could upload it. -- Christian Pietsch, http://www.ub.uni-bielefeld.de/~cpietsch/ LibTec · Library Technology and Knowledge Management Bielefeld University Library, 33615 Bielefeld, Germany
Re: [CODE4LIB] pdf2txt [tesseract]
Hi Eric, On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote: > Robert, can you outline the process you used to get Tesseract to do > OCR agains PDF documents? I installed Tesseract a few months ago, > but I couldn't figure out how to get to work against PDF, only some > image files. Any pointers would be greatly appreciated. (Hmmm. Maybe > Tesseract doesn't do PDF files, only image files, and I need to > convert my PDFs to images, and then the to Tesseract.) --Eric Morgan Once you have Tesseract installed, the easiest way to use it for adding an OCR text layer to PDF files is this Ruby script IMHO: https://github.com/gkovacs/pdfocr Geza Kovacs wrote it for Cuneiform and an old version of OCRopus. I added Tesseract support later. If you cannot use Ruby for some reason, I could upload a BASH script doing the same thing. Cheers, Christian -- Christian Pietsch · http://purl.org/net/pietsch LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany
Re: [CODE4LIB] ElasticSearch
On Thu, Mar 14, 2013 at 06:49:28PM +, Lin, Kun wrote: > That's something pretty pricy. Are you joking? It's free and open-source software: https://github.com/elasticsearch/elasticsearch Some of my colleagues at Bielefeld University Library's LibTec department are using it with LibreCat <http://librecat.org/> to power our university's central publication data service PUB <http://pub.uni-bielefeld.de/>. They seem to be happy with it. In other projects, we stick to SOLR or even pure old Lucence. What are you looking to use ES for? Cheers, Christian -- Christian Pietsch · http://purl.org/net/pietsch LibTec · Library Technology and Knowledge Management Bielefeld University Library, Bielefeld, Germany
Re: [CODE4LIB] Wikis
Hi Nathan, given the huge user base of MediaWiki, you would need very good reasons (read: special requirements) to choose anything else. Also, the large developer community makes Mediawiki a more future-proof choice than anything commercial backed by a single company. On Tue, Jul 24, 2012 at 04:34:27PM -0400, Nathan Tallman wrote: > There are a plethora of options for wiki software. Does anyone have any > recommendations for a platform that's easy-to-use and has a low-learning > curve for users? I think it is fair to say that everyone who uses the Internet also uses Wikipedia, either passively or actively. Have you noticed that search engines will usually return a link to a Wikipedia article on the first page of results, no matter what you are looking for? Hence, there will be no learning curve if you choose Mediawiki. At my university, I run a small internal MediaWiki farm for purposes like yours. My signature below links to two spare-time projects: These are public MediaWiki installations I run elsewhere on a rented virtual private server (Linux VPS). One is using the Semantic Mediawiki extension to implement a database of text generation software systems and related publications; the other serves as a lightweight Web content management system (WCMS) for a special interest group of a research association. I have found MediaWiki easy to use, install and maintain, and so far I have always found a suitable free extension whenever the included funcionality did not suffice. On the other hand, if you need fine-grained access controls, then you do not want a wiki but a full, traditional WCMS. Cheers, Christian -- Christian Pietsch http://www.nlg-wiki.org/ · http://www.sigsem.org/ Bielefeld University Library and CRC 882 Bielefeld, Germany pgpHysK7exoCl.pgp Description: PGP signature
Re: [CODE4LIB] Any ideas for free pdf to excel conversion?
On Wed, Dec 14, 2011 at 02:19:43PM -0600, Jon Gorman wrote: > pdftotext -> some cut & paste / sed / regex -> open in excel? > > You might need to fiddle with the pdftotext settings, but I've been > pretty successful with that before doing something else. This is how I use pdftotext for this purpose: pdftotext -nopgbrk -layout input.pdf output.txt For those who wonder what this is: pdftotext is a command-line tool from the poppler-utils package (this is how it is called in Debian and Ubuntu Linux; see http://poppler.freedesktop.org for source code). The Windows version is here: http://www.foolabs.com/xpdf/download.html The resulting file, here called output.txt, contains plain text with the formatting approximately left intact. Now you can (manually or otherwise) save the tables from this file into files with .csv, .tsv or .dat endings, and with any luck, R's read.table() function and other statistics software as well as most spreadsheet software will be able to open this file and make sense of it. Otherwise, you will need to do some postprocessing/postediting. Cheers, Christian -- Christian Pietsch <http://purl.org/net/pietsch> signature.asc Description: Digital signature
Re: [CODE4LIB] Linux Laptop
Hi Chris, congratulations on your decision. I went from DOS and Windows to Linux and Mac OS X, but after a few months I returned to Linux for good (firing up Windows only to fill out the occasional MS Word form that looks weird in LibreOffice). You have already received a lot of good advice, so as well as adding my own 2 cents, I will try to take that into account. Which Linux? My guess is that coming from Mac OS, Ubuntu will be the Linux distribution you will feel most comfortable with. It is the most popular Linux distro these days anyway, so you can hardly go wrong with it. I have used it almost exclusively in recent years, and I find it worth mentioning that the Ubuntu community is helpful and friendly indeed. You will not often find arrogant BOFH responses in Ubuntu forums because all Ubuntu contributors have signed a very reasonable code of conduct <http://www.ubuntu.com/project/about-ubuntu/conduct>. As others have mentioned, you might want to try some Linux distributions and desktop environments in a virtual machine running on top of Mac OS X. VirtualBox is a popular and free, and works just as well the competition from VMware or Parallels. Which hardware? ___ You may not need new hardware at all. Since the MacBook (Pro) is considered by many the best laptop (hardware), you may want to use it for Linux (and Windows, if you must) as well. The Ubuntu guide for people switching from Mac OS X contains some dual boot advice: https://help.ubuntu.com/community/SwitchingToUbuntu/FromMacOSX I think it is fair to say that Linux runs on any laptop out there. If you want to make sure that every single feature is supported on the particular machine you have in mind, then take a look at the lists Chris Fitzpatrick provided (quoted below), or this one: http://tuxmobil.org/mylaptops.html Specifically for Ubuntu Linux, you will find compatibility reports on https://wiki.ubuntu.com/HardwareSupport/Machines/Laptops which is now being replaced by this site: https://friendly.ubuntu.com/ I have installed Ubuntu on Asus EeePC netbooks, Dell laptops and desktops, and Fujitsu servers. Of these, the Dell computers have caused no trouble at all. I have also heard good things about Lenovo's Thinkpads. Enjoy! Christian On Wed, Dec 14, 2011 at 12:02:40PM -0500, Chris Gray wrote: > It's worth Googling a bit. There are places that sell laptops with > Linux pre-installed (which bypasses the Windows surtax on new PCs). > It was easy to find these but I can't vouch for any of them. > > http://mcelrath.org/laptops.html - Linux Laptop Resellers > > http://www.linux-laptop.net/ - Linux on Laptops > > http://www.linuxcertified.com/linux_laptops.html - Linux Laptop - > Fully Supported & Configured High Performance Linux Laptops and > Netbooks | LinuxCertified > > http://linuxpreloaded.com/ - Buy a Linux Computer -- Christian Pietsch <http://purl.org/net/pietsch> computational linguist, Bielefeld University, Germany signature.asc Description: Digital signature