Re: [CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Kun Lin
In Cloudflare FAQ, it says CNAME setup is considered on case by case
basis. Did anyone successfully setup CNAME with Cloudflare paid account?

Thanks



> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf
> Of Andrew Anderson
> Sent: Friday, June 19, 2015 3:24 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] quick question: CloudFlare
>
> We have had good experience with it so far, yes.  Do you have a
> specific use case that you're concerned about?
>
> --
> Andrew Anderson, Director of Development, Library and Information
> Resources Network, Inc.
> http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
> http://www.facebook.com/LIRNnotes
>
> On Jun 19, 2015, at 12:58, Kun Lin  wrote:
>
>> Quick question:
>>
>>
>>
>> Who is using CloudFlare for their library website? Are they very
>> accommodating in using CNAME?
>>
>>
>>
>> Thanks
>>
>> Kun Lin


Re: [CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Andrew Anderson
That’s a bit sub-optimal regarding how they handle domain setup, I agree.  You 
can get partial functionality by adding a NS record in your existing DNS 
servers for pointing specific records to their DNS servers even without going 
through the full domain delegation process.  After some testing, we were 
sufficiently happy with their service to move forward with the full delegation, 
but this technique worked well for kicking the tires without making the full 
commitment to their DNS service.

The down side to using the NS trick is that their SSL handling will not be 
fully active unless you do the whole domain.  Depending on what you hope to 
accomplish, that may be the make-or-break decision for using their service or 
not.  You can still do SSL on the host under some circumstances, but I believe 
all entries in the top level domain must use their certificates when 
acceleration is active.  Subdomains can still use the SSL certificate on the 
host even without full delegation.

Another reason to consider letting them handle your DNS (if you can) is that 
they have some pretty interesting plans for adding DNSSEC support for later 
this year.

At any rate, what I would suggest you consider is something like this:

testIN  NS  ns1.ns.cloudflare.com
IN  NS  ns2.ns.cloudflare.com

and replace ns1 and ns2 with the name servers assigned to your account.

Of course, you need a “test” record created on the CloudFlare end to serve the 
appropriate DNS entries.  This configuration will send all DNS queries for the 
test host to CloudFlare’s servers and through their acceleration infrastructure.

Hope this helps,
Andrew

-- 
Andrew Anderson, Director of Development, Library and Information Resources 
Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | 
http://www.facebook.com/LIRNnotes

On Jun 19, 2015, at 18:29, Kun Lin  wrote:

> In most case, Cloudflare will want you to delete the whole domain to their
> DNS server. This is impossible for us to do. Therefore, I am trying to
> figure out CNAME option.
> 
> Thanks
> Kun
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Andrew Anderson
> Sent: Friday, June 19, 2015 3:24 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] quick question: CloudFlare
> 
> We have had good experience with it so far, yes.  Do you have a specific
> use case that you're concerned about?
> 
> --
> Andrew Anderson, Director of Development, Library and Information
> Resources Network, Inc.
> http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
> http://www.facebook.com/LIRNnotes
> 
> On Jun 19, 2015, at 12:58, Kun Lin  wrote:
> 
>> Quick question:
>> 
>> 
>> 
>> Who is using CloudFlare for their library website? Are they very
>> accommodating in using CNAME?
>> 
>> 
>> 
>> Thanks
>> 
>> Kun Lin


Re: [CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Cary Gordon
You can create a new domain for your CloudFlare server and use a CNAME record 
to point to that. Depending on how CloudFlare is configured, it may not get the 
results you want.

Cary


> On Jun 19, 2015, at 3:29 PM, Kun Lin  wrote:
> 
> In most case, Cloudflare will want you to delete the whole domain to their
> DNS server. This is impossible for us to do. Therefore, I am trying to
> figure out CNAME option.
> 
> Thanks
> Kun
> 
> -Original Message-
> From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
> Andrew Anderson
> Sent: Friday, June 19, 2015 3:24 PM
> To: CODE4LIB@LISTSERV.ND.EDU
> Subject: Re: [CODE4LIB] quick question: CloudFlare
> 
> We have had good experience with it so far, yes.  Do you have a specific
> use case that you're concerned about?
> 
> --
> Andrew Anderson, Director of Development, Library and Information
> Resources Network, Inc.
> http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
> http://www.facebook.com/LIRNnotes
> 
> On Jun 19, 2015, at 12:58, Kun Lin  wrote:
> 
>> Quick question:
>> 
>> 
>> 
>> Who is using CloudFlare for their library website? Are they very
>> accommodating in using CNAME?
>> 
>> 
>> 
>> Thanks
>> 
>> Kun Lin


Re: [CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Kun Lin
In most case, Cloudflare will want you to delete the whole domain to their
DNS server. This is impossible for us to do. Therefore, I am trying to
figure out CNAME option.

Thanks
Kun

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of
Andrew Anderson
Sent: Friday, June 19, 2015 3:24 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] quick question: CloudFlare

We have had good experience with it so far, yes.  Do you have a specific
use case that you're concerned about?

--
Andrew Anderson, Director of Development, Library and Information
Resources Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes |
http://www.facebook.com/LIRNnotes

On Jun 19, 2015, at 12:58, Kun Lin  wrote:

> Quick question:
>
>
>
> Who is using CloudFlare for their library website? Are they very
> accommodating in using CNAME?
>
>
>
> Thanks
>
> Kun Lin


Re: [CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Andrew Anderson
We have had good experience with it so far, yes.  Do you have a specific use 
case that you’re concerned about?

-- 
Andrew Anderson, Director of Development, Library and Information Resources 
Network, Inc.
http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | 
http://www.facebook.com/LIRNnotes

On Jun 19, 2015, at 12:58, Kun Lin  wrote:

> Quick question:
> 
> 
> 
> Who is using CloudFlare for their library website? Are they very
> accommodating in using CNAME?
> 
> 
> 
> Thanks
> 
> Kun Lin


[CODE4LIB] Job: Library Technology Architect at Harvard University

2015-06-19 Thread jobs
Library Technology Architect
Harvard University
Cambridge, MA

**Harvard University**

  
  
**School/Unit**  
Harvard University Information Technology

  
**Sub-Unit**  
  
  
**Job Function**  
Information Technology

  
**Time Status**  
Full-time

  
**Department**  
Library Technology Services

  
**Salary Grade**  
059

  
**Union**  
00 - Non Union, Exempt or Temporary

  
**Duties & Responsibilities**  
Reporting to the Managing Director of Library Technology Services (LTS),
Harvard University Information Technology (HUIT) seeks a Library Technology
Architect. This new role will be a member of a high performing leadership team
which focuses on the development of powerful solutions to advance scholarship
and teaching through the collection, creation, application, preservation, and
dissemination of knowledge. LTS works in deep collaboration
with the Harvard Library (HL), one of the premier research libraries in the
world, as well as with HUIT's Academic Technology team to ensure that library
materials and services are effectively embedded into the academic
enterprise. The Library Technology Architect will also work
in close collaboration with digital library software engineers and systems
librarians on specific library technology projects which will support the
Harvard Library strategic objectives.

  
Key responsibilities of the position will be to assess the current state of
library enterprise architecture, develop road maps for all major components of
the enterprise library technology environment and to develop a multi-year IT
strategy which aligns with the strategic objectives of the Harvard Library.

  
Initial focus areas will include participation in projects in the following
areas:

  
Participate in a technical review of options for a new Library Services
Platform to assess integration and compliance with Harvard enterprise
architecture

  
Collaborate with key stakeholders in LTS and HL on architectural aspects of
the developing landscape of tools used to collect and create digital library
collections

  
Play active role in the ongoing assessment and architectural planning for
digital repository services for the Harvard Library

  
Develop a plan for Interoperability and to promote the use of APIs in the
library environment

  
Play a lead role in external collaborative developments such as LD4L and the
IIIF consortium

  
**Basic Qualifications**  

  *  Master's degree in computer science or a related field or an equivalent 
combination of education and experience, required
  *  Ten or more years of experience architecting complex enterprise 
information systems
  *  Experience contributing to the successful development and operation of 
enterprise-scale information systems as reliable infrastructure, including 
hardware, software, middleware, and supporting human processes
  
  
  
**Additional Qualifications  
Required:**

  *  Expert knowledge of IT infrastructure and current standards, including 
architecting and integrating multi-tiered system architectures
  *  Solid knowledge of server and storage architectures, as well as cloud 
based solutions; IT middleware, including authentication, authorization, 
account provisioning, identity management, and directory services, and proven 
experience designing system that integrate with / incorporate these services
  *  Demonstrated ability to deliver results in a complex and demand driven 
environment, working collaboratively with diverse stakeholders, colleagues at 
peer institutions and open source communities
  *  Knowledge of key trends in digital developments in research libraries 
including open source community projects, linked open data, bibliographic 
standards, metadata schemas, and digital repositories
  *  Proven ability to capture, document and convey complex business needs and 
processes, translate them into functional requirements and system 
specifications, and develop a supporting technical design at an abstract level, 
and detailed technical specifications (including choice of implementation 
technologies)
  *  Sophisticated understanding of how information is organized and used to 
support the academic and research missions of the university
  *  Solid knowledge of server and storage architectures, as well as cloud 
based solutions; IT middleware, including authentication, authorization, 
account provisioning, identity management, and directory services, and proven 
experience designing system that integrate with / incorporate these services
  *  Experience contributing to the successful development and operation of 
enterprise-scale information systems as reliable infrastructure, including 
hardware, software, middleware, and supporting human processes.
  *  Demonstrated ability to deliver results in a complex and demand driven 
environment, working collaboratively with diverse stakeholders, colleagues at 
peer institutions and open source communities
  *  Knowledge of key trends in digital developments in 

[CODE4LIB] Call for Presentations - Code4lib-SoCal - Aug 28th, 2015

2015-06-19 Thread Joshua Gomez
The next quarterly meetup of code4lib-SoCal will be held at UCLA on August 
28th. Please contact me and Gary Thompson (glt *at* library.ucla.edu) if you 
would like to give a talk or lead a workshop.



You can RSVP for the meetup here: 
http://www.meetup.com/Code4lib-SoCal/events/223360922/



Thanks,

Josh


Joshua Gomez | Sr. Software Engineer

Getty Research Institute | Los Angeles, CA

(310) 440-7410


[CODE4LIB] quick question: CloudFlare

2015-06-19 Thread Kun Lin
Quick question:



Who is using CloudFlare for their library website? Are they very
accommodating in using CNAME?



Thanks

Kun Lin


Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-19 Thread Kevin Hawkins
See also http://wiki.tei-c.org/index.php/Heuristics , which discusses 
this problem more broadly conceived.  I've just added a link to the 
archives of this very discussion.  --Kevin


On 6/18/15 12:52 PM, Matt Sherman wrote:

The hope is to take these bibliographies put it into more of a web
searchable/sortable format for researchers to make use out of them.  My
colleague was taking some inspiration from the Marlowe Bibliography (
https://marlowebibliography.org/), though we are hoping to possibly get a
bit more robust with the bibliography we are working on.  The important
first step it to be able to parse the existing OCRed bibliography scans we
have into a database, possibly a custom XML format but a database will
probably be easier to append and expand down the road.

On Thu, Jun 18, 2015 at 1:11 PM, Kyle Banerjee 
wrote:


How you want to preprocess and structure the data depends on what you hope
to achieve. Can you say more about what you want the end product to look
like?

kyle

On Thu, Jun 18, 2015 at 10:08 AM, Matt Sherman 
wrote:


That is a pretty good summation of it yes.  I appreciate the suggestions,
this is a bit of a new realm for me and while I know what I want it to do
and the structure I want to put it in, the conversion process has been
eluding me so thanks for giving me some tools to look into.

On Thu, Jun 18, 2015 at 1:04 PM, Eric Lease Morgan 

wrote:



On Jun 18, 2015, at 12:02 PM, Matt Sherman 
wrote:


I am working with colleague on a side project which involves some

scanned

bibliographies and making them more web

searchable/sortable/browse-able.

While I am quite familiar with the metadata and organization aspects

we

need, but I am at a bit of a loss on how to automate the process of

putting

the bibliography in a more structured format so that we can avoid

going

through hundreds of pages by hand.  I am pretty sure regular

expressions

are needed, but I have not had an instance where I need to automate
extracting data from one file type (PDF OCR or text extracted to Word

doc)

and place it into another (either a database or an XML file) with

some

enrichment.  I would appreciate any suggestions for approaches or

tools

to

look into.  Thanks for any help/thoughts people can give.



If I understand your question correctly, then you have two problems to
address: 1) converting PDF, Word, etc. files into plain text, and 2)
marking up the result (which is a bibliography) into structure data.
Correct?

If so, then if your PDF documents have already been OCRed, or if you

have

other files, then you can probably feed them to TIKA to quickly and

easily

extract the underlying plain text. [1] I wrote a brain-dead shell

script

to

run TIKA in server mode and then convert Word (.docx) files. [2]

When it comes to marking up the result into structured data, well, good
luck. I think such an application is something Library Land sought for

a

long time. “Can you say Holy Grail?"

[1] Tika - https://tika.apache.org
[2] brain-dead script -
https://gist.github.com/ericleasemorgan/c4e34ffad96c0221f1ff

—
Eric







Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database

2015-06-19 Thread Sylvain Machefert
Hi all,
As Matt's problem is related to parsing citations, I would definitely have a 
look at the tools cited by Cindy because going with regexp will quickly become 
a nightmare. Even if citations have been created following a common reference 
style: there will necessarily be incoherence, amplified by the OCR process. 
This kind of tool already tries to deal with that, just give it a try (FreeCite 
lists other tools or libraries trying to accomplish this).

Looks like a fun project btw!

Regards,
Sylvain

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Harper, 
Cynthia
Sent: 18 June 2015 19:49
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata 
and/or a Database

Eric or others, do you know of any utility that converts a PDF and retains 
coding for where font or font-style changes? Or converts a web page with 
associated CSS and notes where font-style and HTML text block stops and starts? 
 It seems that would be the starting point for recognizing citation entities.  
I've seen websites for FreeCite http://freecite.library.brown.edu/ and Parscit 
http://aye.comp.nus.edu.sg/parsCit/ through web searches, but don't know how 
close they got to the Grail before becoming legend.

Cindy Harper

-Original Message-
From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric 
Lease Morgan
Sent: Thursday, June 18, 2015 1:04 PM
To: CODE4LIB@LISTSERV.ND.EDU
Subject: Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata 
and/or a Database

On Jun 18, 2015, at 12:02 PM, Matt Sherman  wrote:

> I am working with colleague on a side project which involves some
> scanned bibliographies and making them more web 
> searchable/sortable/browse-able.
> While I am quite familiar with the metadata and organization aspects
> we need, but I am at a bit of a loss on how to automate the process of
> putting the bibliography in a more structured format so that we can
> avoid going through hundreds of pages by hand.  I am pretty sure
> regular expressions are needed, but I have not had an instance where I
> need to automate extracting data from one file type (PDF OCR or text
> extracted to Word doc) and place it into another (either a database or
> an XML file) with some enrichment.  I would appreciate any suggestions
> for approaches or tools to look into.  Thanks for any help/thoughts people 
> can give.


If I understand your question correctly, then you have two problems to address: 
1) converting PDF, Word, etc. files into plain text, and 2) marking up the 
result (which is a bibliography) into structure data. Correct?

If so, then if your PDF documents have already been OCRed, or if you have other 
files, then you can probably feed them to TIKA to quickly and easily extract 
the underlying plain text. [1] I wrote a brain-dead shell script to run TIKA in 
server mode and then convert Word (.docx) files. [2]

When it comes to marking up the result into structured data, well, good luck. I 
think such an application is something Library Land sought for a long time. 
“Can you say Holy Grail?"

[1] Tika - https://tika.apache.org
[2] brain-dead script - 
https://gist.github.com/ericleasemorgan/c4e34ffad96c0221f1ff

—
Eric


This email and any files transmitted with it were intended solely for the 
addressee. If you have received this email in error please let the sender know 
by return.

Please think before you print.