Re: [CODE4LIB] code4lib mailing list

2016-03-24 Thread Christian Pietsch
On Thu, Mar 24, 2016 at 10:39:03AM -0400, Ranti Junus wrote:
> Mailman is easy to administer, but it has a huge caveat: when a user
> request a password (reminder, etc.), it sends it as an email in plain text.

Fortunately, list administrators can disable the monthly password
reminder. Site admininistrators can disable them by default for all
lists by setting DEFAULT_SEND_REMINDERS = No in mm_cfg.py.

I agree that Mailman is the best open source mailing list software,
and I doubt if there is a better proprietary solution. I could apply
for a Mailman list from my university, but it would have to have the
uni-bielefeld.de domain in its address; no custom domain possible.

Has anybody asked JISC if they would host this list? They have
accepted international mailing lists in the past provided that some
British users are involved.

Please do not use Google Groups – Google already tracks 78% of all
website visits globally, so it's time to stop giving them data
<https://timlibert.me/pdf/Libert-2015-Health_Privacy_on_Web.pdf>.

Greetings from Europe!
Christian

-- 
  Christian Pietsch · http://www.ub.uni-bielefeld.de/~cpietsch
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany



signature.asc
Description: PGP signature


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-18 Thread Christian Pietsch
On Tue, Aug 18, 2015 at 09:29:17PM +1200, Stuart A. Yeates wrote:
> While these may appear to be OAI-PMH providers, they're non-conformant:
> 
> http://www.openarchives.org/OAI/openarchivesprotocol.html#ProtocolFeatures
> 
> OAI-PMH requests *must* be submitted using either the HTTP GET or POST
>  methods.

Everything that holds for HTTP also holds for HTTPS because HTTPS is
simply HTTP over TLS, as the HTTPS standard is aptly titled:
https://tools.ietf.org/html/rfc2818

A discussion on the OAI implementers mailing list seemed to converge
on the position to accept HTTPS wherever possible but not to require
it. That was in 2005 when the IETF had not started to consider
declaring HTTP without TLS obsolete altogether.
https://www.openarchives.org/pipermail/oai-implementers/2005-February/001419.html

> Maybe because forcing people to upgrade their tech leaves behind those with
> the least resources. Maybe because switching to a protocol whose minimum
> message cost (in cpu cycles) is many thousands of times higher is a dubious
> cost/benefit trade-off in some situations.

The burden of TLS encryption on CPUs is negligible these days:
https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html

C:

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec (Library Technology and Knowledge Management) department
  of Bielefeld University Library, Bielefeld, Germany


signature.asc
Description: Digital signature


Re: [CODE4LIB] Protocol-relative URLs in MARC

2015-08-18 Thread Christian Pietsch
Thank you, Andrew, for answering the question. What Stuart wrote,
however, is misleading:

On Tue, Aug 18, 2015 at 02:59:37PM +1200, Stuart A. Yeates wrote:
> On Tue, Aug 18, 2015 at 10:08 AM, Andrew Anderson  wrote:
> 
> > That said, there is a big push recently for dropping non-SSL connections
> > in general (going so far as to call the protocol relative URIs an
> > anti-pattern), so is it really worth all the potential pain and suffering
> > to make your links scheme-agnostic, when maybe it would be a better
> > investment in time to switch them all to SSL instead?  This dovetails
> > nicely with some of the discussions I have had recently with electronic
> > services librarians about how to protect patron privacy in an online world
> > by using SSL as an arrow in that quiver.
> >
> 
> Dropping non-SSL connections is almost certainly a mistake for two classes
> reasons:
> (i) a number of very widely used tools and standards (OAI-PMH, web
> cacheing, monitoring, etc.) are HTTP-only

Let me give you a counter example: Of 4810 OAI-PMH providers currently
known to BASE <https://base-search.net>, 147 use a HTTPS base URL. Of
the 3632 OAI-PMH sources BASE actively harvests at this time, 107 use
HTTPS.

> (ii) assumptions about the proportion of our users who have access
> to a certain level tech (i.e. HTTP vs HTTPS) systematically disadvantages
> already disadvantaged groups of users, perpetuating the kind of
> social ills that libraries are traditional held to be the cure of.

I fail to see how continuing to use insecure, obsolete software is
serving social justice. Excellent cryptographic software is available
freely and openly.

Cheers,
Chris

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec (Library Technology and Knowledge Management) department
  of Bielefeld University Library, Bielefeld, Germany


signature.asc
Description: Digital signature


Re: [CODE4LIB] XSLT stylesheet from MARC21 to RIS or BIBTEX

2015-07-14 Thread Christian Pietsch
Hi Zeno,

here is a way to do one part in XSLT and another using a C program:

1. Use an XSLT stylesheet to convert MARC21 to MODS
   <http://www.loc.gov/standards/mods/v3/MARC21slim2MODS3.xsl>.

2. Then, use bibutils <http://sourceforge.net/projects/bibutils/>
   to convert MODS to your target formats. You will need:
   xml2bib   -   convert MODS into bibtex
   xml2ris   -   convert MODS into RIS format
 
Or if you can use Perl, do the conversion in one go using Catmandu
<https://librecatproject.wordpress.com/2014/12/11/day-9-processing-marc-with-catmandu/>.

Cheers,
Christian


On Tue, Jul 14, 2015 at 10:27:20AM +0200, Tajoli Zeno wrote:
> Hi to all,
> 
> do you know XSLT stylesheet to do conversion from
> MARC21 to RIS or  from MARC21 to BIBTEX ?
> 
> Bye
> Zeno Tajoli

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec (Library Technology and Knowledge Management) department
  of Bielefeld University Library, Bielefeld, Germany


signature.asc
Description: Digital signature


Re: [CODE4LIB] Very frustrated with Drupal

2014-05-15 Thread Christian Pietsch
Hi Joshua,

On Thu, May 15, 2014 at 09:47:06AM -0500, Joshua Welker wrote:
> Thank you all for the responses. I hope my original email did not come off
> as too abrasive.

No worries, I find it a fair depiction, and I share your Drupal pain.

> The issue for me is that I am having a hard time figuring out what exactly
> is the use case for Drupal. Do you want a dead-simple website? Use
> Wordpress. Do you want to add some complex custom apps? Use a framework. Do
> you want the worst of both worlds? Use Drupal.

Right. May I quote you on this? I prefer static site generators such
as Jekyll for dead-simple websites and blogs.

> If I get hit by a bus, not only will someone have
> to relearn Drupal and all its modules, but they will also have to wade
> through my spaghetti-code efforts at patching functionality into Drupal.

After I decided to leave a project where I had developed a Drupal
intranet site, my successor scrapped it and started from scratch using
Owncloud. And I do not blame him. I would have preferred using
something other than Drupal, too, but was not allowed to at the time.

(In case you wonder how Drupal and Owncloud can fit the same purpose:
The goal was to develop a Virtual Research Environment, and nobody
knows for sure what this is supposed to be, so there is room for
interpretation.)

> Right now, my framework choices are narrowed down to Ruby on Rails, Laravel
> (PHP), Django (Python), and Flask (Python). For anyone who has used these,
> do you have any insight into how maintainable your projects are and how
> easily they are managed/inherited by others?

In my new role, I inherited some Flask applications, and I find
maintaining, debugging and extending them pure joy. If you have to use
Perl instead of Python, use Dancer instead.

I also tried Django, but it I feel it forces me into a corset that is
a odd re-interpretation (or misunderstanding) of the MVC model.

Cheers,
Christian

-- 
  Christian Pietsch
  http://purl.org/net/pietsch


Re: [CODE4LIB] Python in Your Library

2014-05-07 Thread Christian Pietsch
Let's not start another discussion about programming languages, but
for the record:

On Wed, May 07, 2014 at 09:49:16AM -0400, Molly Des Jardin wrote:
> This is not a complex point, but I've been using Python about since it came
> out (since 2002) and have found it both flexible and very easy to learn and

Python came out in 1991:
https://en.wikipedia.org/wiki/Python_%28programming_language%29

> use. I'd recommend it as a scripting language for beginning programmers and
> those experimenting around.

Python was designed as a teaching language. Today, it is a universal
programming language suitable for all tasks. Low-level (systems) and
performance critical functions can be written in C if required.

At Bielefeld University Library we use Python for rapidly prototyping
Web services based on the Flask framework. These services have proven
stable and efficient, so there has been no need to re-write them for
production use. Examples include an online OAI-PMH validator
<http://oval.base-search.net/>, a classifier for assinging DDC labels
to English or German abstracts <http://clfapi.base-search.net/>, and
an OAI-PMH interface for BASE, the Bielefeld Academic Search Engine.

If you work closely with researchers, it might be relevant to you that
Python has become the dominant language in scientific programming
because of great progress in the NumPy/SciPy/pandas frameworks.

The majority of our software is still developed in Perl – for
historical reasons, and because there are exciting new frameworks in
Perl such as the ETL framework Catmandu <http://librecat.org/>.

Cheers
Christian

-- 
  Christian Pietsch
  http://purl.org/net/pietsch


Re: [CODE4LIB] transforming marc to rdf

2013-12-05 Thread Christian Pietsch
Hi Eric,

you seem to have missed the Catmandu tutorial at SWIB13. Luckily there
is a basic tutorial and a demo online: http://librecat.org/

The demo happens to be about transforming MARC to RDF using the
Catmandu Perl framework. It gives you full flexibility by separating
the importer from the exporter and providing a domain specific
language for “fixing” the data in between. Catmandu also has easy
to use wrappers for popular search engines and databases (both SQL and
NoSQL), making it a complete ETL (extract, transform, load) toolkit.

Disclosure: I am a Catmandu contributor. It's free and open source
software.

Cheers,
Christian


On Wed, Dec 04, 2013 at 09:59:46PM -0500, Eric Lease Morgan wrote:
> Converting MARC to RDF has been more problematic. There are various
> tools enabling me to convert my original MARC into MARCXML and/or
> MODS. After that I can reportably use a few tools to convert to RDF:
> 
>   * MARC21slim2RDFDC.xsl [3] - functions, but even for
> my tastes the resulting RDF is too vanilla. [4]
> 
>   * modsrdf.xsl [5] - optimal, but when I use my
> transformation engine (Saxon), I do not get XML
> but rather plain text
> 
>   * BIBFRAME Tools [6] - sports nice ontologies, but
> the online tools won’t scale for large operations

-- 
  Christian Pietsch · http://www.ub.uni-bielefeld.de/~cpietsch/
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany


Re: [CODE4LIB] We should use HTTPS on code4lib.org

2013-11-04 Thread Christian Pietsch
On Mon, Nov 04, 2013 at 01:45:12PM -0500, Ethan Gruber wrote:
> NSA broke it already

Very funny but untrue. While it is certainly possible to create an
insecure TLS certificate, for all we know it is not true that TLS has
been broken in general.

It is still one of the most usable protections against eavesdropping
and is still recommended by independent encryption expert Bruce
Schneier [1] and the Press Freedom Foundation [2], among others. In a
recent recommendation, the BSI (the German federal office for IT
security) recommended TLS 1.2 with Perfect Forward Security [3] for
all HTTPS setups. I am sure it would be beneficial for code4lib.org,
too.

Cheers,
Christian
Bielefeld University Library, Germany

References:
[1] 
http://www.theguardian.com/world/2013/sep/05/nsa-how-to-remain-secure-surveillance
[2] https://pressfreedomfoundation.org/encryption-works
[3] 
http://www.pro-linux.de/news/1/20333/bsi-empfiehlt-tls-12-mit-perfect-forward-secrecy.html


Re: [CODE4LIB] Question for Institutional Repository Folks

2013-10-28 Thread Christian Pietsch
Hi Matt,

if you are certain that a PDF file was encumbered with DRM
restrictions by mistake, then you can easily remove DRM using a tool
from the free MuPDF software which is available for all major
operating systems including Windows: http://www.mupdf.com/

If you have the current version, the command line goes like this:
mutool clean old.pdf new.pdf

Older versions of MyPDF included a different executable for this:
pdfclean old.pdf new.pdf

As for editing PDF files ... this is not what they are intended for,
but it is possible with tools like PDFedit <http://pdfedit.cz>, Gimp,
Inkscape or Scribus.

Cheers,
Christian


On Mon, Oct 28, 2013 at 01:13:24PM -0400, Matthew Sherman wrote:
> Can anyone give me some advice in how I can edit this to add the
> required note to the top of the PDF?  Any advice is welcome.

--
  Christian Pietsch
  http://purl.org/net/pietsch


Re: [CODE4LIB] pdf2txt [tesseract]

2013-10-18 Thread Christian Pietsch
Hi Padraic,

I have uploaded a shell script which happens to implement Robert
Haschart's recipe:
https://github.com/pietsch/Data-Munging/blob/master/ocr4pdf.sh

Enjoy!
Christian


On Fri, Oct 18, 2013 at 10:22:17AM +0100, Padraic Stack wrote:
> I would love to see that bash script if you could upload it.

-- 
   Christian Pietsch, http://www.ub.uni-bielefeld.de/~cpietsch/
   LibTec · Library Technology and Knowledge Management
   Bielefeld University Library, 33615 Bielefeld, Germany


Re: [CODE4LIB] pdf2txt [tesseract]

2013-10-17 Thread Christian Pietsch
Hi Eric,

On Thu, Oct 17, 2013 at 09:43:04AM -0400, Eric Lease Morgan wrote:
> Robert, can you outline the process you used to get Tesseract to do
> OCR agains PDF documents? I installed Tesseract a few months ago,
> but I couldn't figure out how to get to work against PDF, only some
> image files. Any pointers would be greatly appreciated. (Hmmm. Maybe
> Tesseract doesn't do PDF files, only image files, and I need to
> convert my PDFs to images, and then the to Tesseract.) --Eric Morgan

Once you have Tesseract installed, the easiest way to use it for
adding an OCR text layer to PDF files is this Ruby script IMHO:
https://github.com/gkovacs/pdfocr
Geza Kovacs wrote it for Cuneiform and an old version of OCRopus.
I added Tesseract support later.

If you cannot use Ruby for some reason, I could upload a BASH script
doing the same thing.

Cheers,
Christian

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany


Re: [CODE4LIB] ElasticSearch

2013-03-14 Thread Christian Pietsch
On Thu, Mar 14, 2013 at 06:49:28PM +, Lin, Kun wrote:
> That's something pretty pricy.

Are you joking? It's free and open-source software:
https://github.com/elasticsearch/elasticsearch

Some of my colleagues at Bielefeld University Library's LibTec department are
using it with LibreCat <http://librecat.org/> to power our university's central
publication data service PUB <http://pub.uni-bielefeld.de/>. They seem to be
happy with it. In other projects, we stick to SOLR or even pure old Lucence.
What are you looking to use ES for?

Cheers,
Christian

-- 
  Christian Pietsch · http://purl.org/net/pietsch
  LibTec · Library Technology and Knowledge Management
  Bielefeld University Library, Bielefeld, Germany


Re: [CODE4LIB] Wikis

2012-07-25 Thread Christian Pietsch
Hi Nathan,

given the huge user base of MediaWiki, you would need very good
reasons (read: special requirements) to choose anything else. Also,
the large developer community makes Mediawiki a more future-proof
choice than anything commercial backed by a single company.

On Tue, Jul 24, 2012 at 04:34:27PM -0400, Nathan Tallman wrote:
> There are a plethora of options for wiki software. Does anyone have any
> recommendations for a platform that's easy-to-use and has a low-learning
> curve for users? 

I think it is fair to say that everyone who uses the Internet also
uses Wikipedia, either passively or actively. Have you noticed that
search engines will usually return a link to a Wikipedia article on
the first page of results, no matter what you are looking for? Hence,
there will be no learning curve if you choose Mediawiki.

At my university, I run a small internal MediaWiki farm for purposes
like yours. My signature below links to two spare-time projects: These
are public MediaWiki installations I run elsewhere on a rented virtual
private server (Linux VPS). One is using the Semantic Mediawiki
extension to implement a database of text generation software systems
and related publications; the other serves as a lightweight Web
content management system (WCMS) for a special interest group of a
research association. I have found MediaWiki easy to use, install and
maintain, and so far I have always found a suitable free extension
whenever the included funcionality did not suffice. On the other hand,
if you need fine-grained access controls, then you do not want a wiki
but a full, traditional WCMS.

Cheers,
Christian

-- 
   Christian Pietsch
   http://www.nlg-wiki.org/ · http://www.sigsem.org/
   Bielefeld University Library and CRC 882
   Bielefeld, Germany


pgpHysK7exoCl.pgp
Description: PGP signature


Re: [CODE4LIB] Any ideas for free pdf to excel conversion?

2011-12-15 Thread Christian Pietsch
On Wed, Dec 14, 2011 at 02:19:43PM -0600, Jon Gorman wrote:
> pdftotext -> some cut & paste / sed / regex -> open in excel?
> 
> You might need to fiddle with the pdftotext settings, but I've been
> pretty successful with that before doing something else.

This is how I use pdftotext for this purpose:
pdftotext -nopgbrk -layout input.pdf output.txt

For those who wonder what this is: pdftotext is a command-line tool
from the poppler-utils package (this is how it is called in Debian and
Ubuntu Linux; see http://poppler.freedesktop.org for source code).
The Windows version is here: http://www.foolabs.com/xpdf/download.html

The resulting file, here called output.txt, contains plain text with
the formatting approximately left intact. Now you can (manually or
otherwise) save the tables from this file into files with .csv, .tsv
or .dat endings, and with any luck, R's read.table() function and
other statistics software as well as most spreadsheet software
will be able to open this file and make sense of it. Otherwise, you
will need to do some postprocessing/postediting.

Cheers,
Christian

-- 
  Christian Pietsch <http://purl.org/net/pietsch>


signature.asc
Description: Digital signature


Re: [CODE4LIB] Linux Laptop

2011-12-15 Thread Christian Pietsch
Hi Chris,

congratulations on your decision. I went from DOS and Windows to Linux
and Mac OS X, but after a few months I returned to Linux for good
(firing up Windows only to fill out the occasional MS Word form that
looks weird in LibreOffice). You have already received a lot of good
advice, so as well as adding my own 2 cents, I will try to take that
into account.


Which Linux?


My guess is that coming from Mac OS, Ubuntu will be the Linux
distribution you will feel most comfortable with. It is the most
popular Linux distro these days anyway, so you can hardly go wrong
with it. I have used it almost exclusively in recent years, and I find
it worth mentioning that the Ubuntu community is helpful and friendly
indeed. You will not often find arrogant BOFH responses in Ubuntu
forums because all Ubuntu contributors have signed a very reasonable
code of conduct <http://www.ubuntu.com/project/about-ubuntu/conduct>.

As others have mentioned, you might want to try some Linux
distributions and desktop environments in a virtual machine running on
top of Mac OS X. VirtualBox is a popular and free, and works just as
well the competition from VMware or Parallels.


Which hardware?
___

You may not need new hardware at all. Since the MacBook (Pro) is
considered by many the best laptop (hardware), you may want to use it
for Linux (and Windows, if you must) as well. The Ubuntu guide for
people switching from Mac OS X contains some dual boot advice:
https://help.ubuntu.com/community/SwitchingToUbuntu/FromMacOSX

I think it is fair to say that Linux runs on any laptop out there. If
you want to make sure that every single feature is supported on the
particular machine you have in mind, then take a look at the lists
Chris Fitzpatrick provided (quoted below), or this one:
http://tuxmobil.org/mylaptops.html
Specifically for Ubuntu Linux, you will find compatibility reports on
https://wiki.ubuntu.com/HardwareSupport/Machines/Laptops which is now
being replaced by this site: https://friendly.ubuntu.com/

I have installed Ubuntu on Asus EeePC netbooks, Dell laptops and
desktops, and Fujitsu servers. Of these, the Dell computers have
caused no trouble at all. I have also heard good things about Lenovo's
Thinkpads.

Enjoy!
Christian


On Wed, Dec 14, 2011 at 12:02:40PM -0500, Chris Gray wrote:
> It's worth Googling a bit.  There are places that sell laptops with
> Linux pre-installed (which bypasses the Windows surtax on new PCs).
> It was easy to find these but I can't vouch for any of them.
> 
> http://mcelrath.org/laptops.html - Linux Laptop Resellers
> 
> http://www.linux-laptop.net/ - Linux on Laptops
> 
> http://www.linuxcertified.com/linux_laptops.html - Linux Laptop -
> Fully Supported & Configured High Performance Linux Laptops and
> Netbooks | LinuxCertified
> 
> http://linuxpreloaded.com/ - Buy a Linux Computer

-- 
  Christian Pietsch <http://purl.org/net/pietsch>
  computational linguist, Bielefeld University, Germany


signature.asc
Description: Digital signature