Re: [CODE4LIB] quick question: CloudFlare
In Cloudflare FAQ, it says CNAME setup is considered on case by case basis. Did anyone successfully setup CNAME with Cloudflare paid account? Thanks > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf > Of Andrew Anderson > Sent: Friday, June 19, 2015 3:24 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] quick question: CloudFlare > > We have had good experience with it so far, yes. Do you have a > specific use case that you're concerned about? > > -- > Andrew Anderson, Director of Development, Library and Information > Resources Network, Inc. > http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | > http://www.facebook.com/LIRNnotes > > On Jun 19, 2015, at 12:58, Kun Lin wrote: > >> Quick question: >> >> >> >> Who is using CloudFlare for their library website? Are they very >> accommodating in using CNAME? >> >> >> >> Thanks >> >> Kun Lin
Re: [CODE4LIB] quick question: CloudFlare
That’s a bit sub-optimal regarding how they handle domain setup, I agree. You can get partial functionality by adding a NS record in your existing DNS servers for pointing specific records to their DNS servers even without going through the full domain delegation process. After some testing, we were sufficiently happy with their service to move forward with the full delegation, but this technique worked well for kicking the tires without making the full commitment to their DNS service. The down side to using the NS trick is that their SSL handling will not be fully active unless you do the whole domain. Depending on what you hope to accomplish, that may be the make-or-break decision for using their service or not. You can still do SSL on the host under some circumstances, but I believe all entries in the top level domain must use their certificates when acceleration is active. Subdomains can still use the SSL certificate on the host even without full delegation. Another reason to consider letting them handle your DNS (if you can) is that they have some pretty interesting plans for adding DNSSEC support for later this year. At any rate, what I would suggest you consider is something like this: testIN NS ns1.ns.cloudflare.com IN NS ns2.ns.cloudflare.com and replace ns1 and ns2 with the name servers assigned to your account. Of course, you need a “test” record created on the CloudFlare end to serve the appropriate DNS entries. This configuration will send all DNS queries for the test host to CloudFlare’s servers and through their acceleration infrastructure. Hope this helps, Andrew -- Andrew Anderson, Director of Development, Library and Information Resources Network, Inc. http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes On Jun 19, 2015, at 18:29, Kun Lin wrote: > In most case, Cloudflare will want you to delete the whole domain to their > DNS server. This is impossible for us to do. Therefore, I am trying to > figure out CNAME option. > > Thanks > Kun > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Andrew Anderson > Sent: Friday, June 19, 2015 3:24 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] quick question: CloudFlare > > We have had good experience with it so far, yes. Do you have a specific > use case that you're concerned about? > > -- > Andrew Anderson, Director of Development, Library and Information > Resources Network, Inc. > http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | > http://www.facebook.com/LIRNnotes > > On Jun 19, 2015, at 12:58, Kun Lin wrote: > >> Quick question: >> >> >> >> Who is using CloudFlare for their library website? Are they very >> accommodating in using CNAME? >> >> >> >> Thanks >> >> Kun Lin
Re: [CODE4LIB] quick question: CloudFlare
You can create a new domain for your CloudFlare server and use a CNAME record to point to that. Depending on how CloudFlare is configured, it may not get the results you want. Cary > On Jun 19, 2015, at 3:29 PM, Kun Lin wrote: > > In most case, Cloudflare will want you to delete the whole domain to their > DNS server. This is impossible for us to do. Therefore, I am trying to > figure out CNAME option. > > Thanks > Kun > > -Original Message- > From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of > Andrew Anderson > Sent: Friday, June 19, 2015 3:24 PM > To: CODE4LIB@LISTSERV.ND.EDU > Subject: Re: [CODE4LIB] quick question: CloudFlare > > We have had good experience with it so far, yes. Do you have a specific > use case that you're concerned about? > > -- > Andrew Anderson, Director of Development, Library and Information > Resources Network, Inc. > http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | > http://www.facebook.com/LIRNnotes > > On Jun 19, 2015, at 12:58, Kun Lin wrote: > >> Quick question: >> >> >> >> Who is using CloudFlare for their library website? Are they very >> accommodating in using CNAME? >> >> >> >> Thanks >> >> Kun Lin
Re: [CODE4LIB] quick question: CloudFlare
In most case, Cloudflare will want you to delete the whole domain to their DNS server. This is impossible for us to do. Therefore, I am trying to figure out CNAME option. Thanks Kun -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Andrew Anderson Sent: Friday, June 19, 2015 3:24 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] quick question: CloudFlare We have had good experience with it so far, yes. Do you have a specific use case that you're concerned about? -- Andrew Anderson, Director of Development, Library and Information Resources Network, Inc. http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes On Jun 19, 2015, at 12:58, Kun Lin wrote: > Quick question: > > > > Who is using CloudFlare for their library website? Are they very > accommodating in using CNAME? > > > > Thanks > > Kun Lin
Re: [CODE4LIB] quick question: CloudFlare
We have had good experience with it so far, yes. Do you have a specific use case that you’re concerned about? -- Andrew Anderson, Director of Development, Library and Information Resources Network, Inc. http://www.lirn.net/ | http://www.twitter.com/LIRNnotes | http://www.facebook.com/LIRNnotes On Jun 19, 2015, at 12:58, Kun Lin wrote: > Quick question: > > > > Who is using CloudFlare for their library website? Are they very > accommodating in using CNAME? > > > > Thanks > > Kun Lin
[CODE4LIB] Job: Library Technology Architect at Harvard University
Library Technology Architect Harvard University Cambridge, MA **Harvard University** **School/Unit** Harvard University Information Technology **Sub-Unit** **Job Function** Information Technology **Time Status** Full-time **Department** Library Technology Services **Salary Grade** 059 **Union** 00 - Non Union, Exempt or Temporary **Duties & Responsibilities** Reporting to the Managing Director of Library Technology Services (LTS), Harvard University Information Technology (HUIT) seeks a Library Technology Architect. This new role will be a member of a high performing leadership team which focuses on the development of powerful solutions to advance scholarship and teaching through the collection, creation, application, preservation, and dissemination of knowledge. LTS works in deep collaboration with the Harvard Library (HL), one of the premier research libraries in the world, as well as with HUIT's Academic Technology team to ensure that library materials and services are effectively embedded into the academic enterprise. The Library Technology Architect will also work in close collaboration with digital library software engineers and systems librarians on specific library technology projects which will support the Harvard Library strategic objectives. Key responsibilities of the position will be to assess the current state of library enterprise architecture, develop road maps for all major components of the enterprise library technology environment and to develop a multi-year IT strategy which aligns with the strategic objectives of the Harvard Library. Initial focus areas will include participation in projects in the following areas: Participate in a technical review of options for a new Library Services Platform to assess integration and compliance with Harvard enterprise architecture Collaborate with key stakeholders in LTS and HL on architectural aspects of the developing landscape of tools used to collect and create digital library collections Play active role in the ongoing assessment and architectural planning for digital repository services for the Harvard Library Develop a plan for Interoperability and to promote the use of APIs in the library environment Play a lead role in external collaborative developments such as LD4L and the IIIF consortium **Basic Qualifications** * Master's degree in computer science or a related field or an equivalent combination of education and experience, required * Ten or more years of experience architecting complex enterprise information systems * Experience contributing to the successful development and operation of enterprise-scale information systems as reliable infrastructure, including hardware, software, middleware, and supporting human processes **Additional Qualifications Required:** * Expert knowledge of IT infrastructure and current standards, including architecting and integrating multi-tiered system architectures * Solid knowledge of server and storage architectures, as well as cloud based solutions; IT middleware, including authentication, authorization, account provisioning, identity management, and directory services, and proven experience designing system that integrate with / incorporate these services * Demonstrated ability to deliver results in a complex and demand driven environment, working collaboratively with diverse stakeholders, colleagues at peer institutions and open source communities * Knowledge of key trends in digital developments in research libraries including open source community projects, linked open data, bibliographic standards, metadata schemas, and digital repositories * Proven ability to capture, document and convey complex business needs and processes, translate them into functional requirements and system specifications, and develop a supporting technical design at an abstract level, and detailed technical specifications (including choice of implementation technologies) * Sophisticated understanding of how information is organized and used to support the academic and research missions of the university * Solid knowledge of server and storage architectures, as well as cloud based solutions; IT middleware, including authentication, authorization, account provisioning, identity management, and directory services, and proven experience designing system that integrate with / incorporate these services * Experience contributing to the successful development and operation of enterprise-scale information systems as reliable infrastructure, including hardware, software, middleware, and supporting human processes. * Demonstrated ability to deliver results in a complex and demand driven environment, working collaboratively with diverse stakeholders, colleagues at peer institutions and open source communities * Knowledge of key trends in digital developments in
[CODE4LIB] Call for Presentations - Code4lib-SoCal - Aug 28th, 2015
The next quarterly meetup of code4lib-SoCal will be held at UCLA on August 28th. Please contact me and Gary Thompson (glt *at* library.ucla.edu) if you would like to give a talk or lead a workshop. You can RSVP for the meetup here: http://www.meetup.com/Code4lib-SoCal/events/223360922/ Thanks, Josh Joshua Gomez | Sr. Software Engineer Getty Research Institute | Los Angeles, CA (310) 440-7410
[CODE4LIB] quick question: CloudFlare
Quick question: Who is using CloudFlare for their library website? Are they very accommodating in using CNAME? Thanks Kun Lin
Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database
See also http://wiki.tei-c.org/index.php/Heuristics , which discusses this problem more broadly conceived. I've just added a link to the archives of this very discussion. --Kevin On 6/18/15 12:52 PM, Matt Sherman wrote: The hope is to take these bibliographies put it into more of a web searchable/sortable format for researchers to make use out of them. My colleague was taking some inspiration from the Marlowe Bibliography ( https://marlowebibliography.org/), though we are hoping to possibly get a bit more robust with the bibliography we are working on. The important first step it to be able to parse the existing OCRed bibliography scans we have into a database, possibly a custom XML format but a database will probably be easier to append and expand down the road. On Thu, Jun 18, 2015 at 1:11 PM, Kyle Banerjee wrote: How you want to preprocess and structure the data depends on what you hope to achieve. Can you say more about what you want the end product to look like? kyle On Thu, Jun 18, 2015 at 10:08 AM, Matt Sherman wrote: That is a pretty good summation of it yes. I appreciate the suggestions, this is a bit of a new realm for me and while I know what I want it to do and the structure I want to put it in, the conversion process has been eluding me so thanks for giving me some tools to look into. On Thu, Jun 18, 2015 at 1:04 PM, Eric Lease Morgan wrote: On Jun 18, 2015, at 12:02 PM, Matt Sherman wrote: I am working with colleague on a side project which involves some scanned bibliographies and making them more web searchable/sortable/browse-able. While I am quite familiar with the metadata and organization aspects we need, but I am at a bit of a loss on how to automate the process of putting the bibliography in a more structured format so that we can avoid going through hundreds of pages by hand. I am pretty sure regular expressions are needed, but I have not had an instance where I need to automate extracting data from one file type (PDF OCR or text extracted to Word doc) and place it into another (either a database or an XML file) with some enrichment. I would appreciate any suggestions for approaches or tools to look into. Thanks for any help/thoughts people can give. If I understand your question correctly, then you have two problems to address: 1) converting PDF, Word, etc. files into plain text, and 2) marking up the result (which is a bibliography) into structure data. Correct? If so, then if your PDF documents have already been OCRed, or if you have other files, then you can probably feed them to TIKA to quickly and easily extract the underlying plain text. [1] I wrote a brain-dead shell script to run TIKA in server mode and then convert Word (.docx) files. [2] When it comes to marking up the result into structured data, well, good luck. I think such an application is something Library Land sought for a long time. “Can you say Holy Grail?" [1] Tika - https://tika.apache.org [2] brain-dead script - https://gist.github.com/ericleasemorgan/c4e34ffad96c0221f1ff — Eric
Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database
Hi all, As Matt's problem is related to parsing citations, I would definitely have a look at the tools cited by Cindy because going with regexp will quickly become a nightmare. Even if citations have been created following a common reference style: there will necessarily be incoherence, amplified by the OCR process. This kind of tool already tries to deal with that, just give it a try (FreeCite lists other tools or libraries trying to accomplish this). Looks like a fun project btw! Regards, Sylvain -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Harper, Cynthia Sent: 18 June 2015 19:49 To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database Eric or others, do you know of any utility that converts a PDF and retains coding for where font or font-style changes? Or converts a web page with associated CSS and notes where font-style and HTML text block stops and starts? It seems that would be the starting point for recognizing citation entities. I've seen websites for FreeCite http://freecite.library.brown.edu/ and Parscit http://aye.comp.nus.edu.sg/parsCit/ through web searches, but don't know how close they got to the Grail before becoming legend. Cindy Harper -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Eric Lease Morgan Sent: Thursday, June 18, 2015 1:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Desiring Advice for Converting OCR Text into Metadata and/or a Database On Jun 18, 2015, at 12:02 PM, Matt Sherman wrote: > I am working with colleague on a side project which involves some > scanned bibliographies and making them more web > searchable/sortable/browse-able. > While I am quite familiar with the metadata and organization aspects > we need, but I am at a bit of a loss on how to automate the process of > putting the bibliography in a more structured format so that we can > avoid going through hundreds of pages by hand. I am pretty sure > regular expressions are needed, but I have not had an instance where I > need to automate extracting data from one file type (PDF OCR or text > extracted to Word doc) and place it into another (either a database or > an XML file) with some enrichment. I would appreciate any suggestions > for approaches or tools to look into. Thanks for any help/thoughts people > can give. If I understand your question correctly, then you have two problems to address: 1) converting PDF, Word, etc. files into plain text, and 2) marking up the result (which is a bibliography) into structure data. Correct? If so, then if your PDF documents have already been OCRed, or if you have other files, then you can probably feed them to TIKA to quickly and easily extract the underlying plain text. [1] I wrote a brain-dead shell script to run TIKA in server mode and then convert Word (.docx) files. [2] When it comes to marking up the result into structured data, well, good luck. I think such an application is something Library Land sought for a long time. “Can you say Holy Grail?" [1] Tika - https://tika.apache.org [2] brain-dead script - https://gist.github.com/ericleasemorgan/c4e34ffad96c0221f1ff — Eric This email and any files transmitted with it were intended solely for the addressee. If you have received this email in error please let the sender know by return. Please think before you print.