Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
Thanks for everyone's ideas. I think Shaun's solution is exactly what I'm trying to do. Shaun Ellis mailto:sha...@princeton.edu June 16, 2014 at 10:36 PM Ryan, it sounds like you simply want to pull two relational tables into drupal using referenced entities in a one (question) to many (answers) relationship? This can be accomplished, albeit unintuitively (it is Drupal afterall), by using the Feeds CSV parser with Feeds Tamper, and I would skip what seems like an unnecessary conversion to XML step. First, you import your questions and set a GUID target (typically your db id). Then import your answers -- each should have a reference to the id mapped using the Entity Reference by Feeds GUID. I have done it in reverse order too, in which case you'd import all your answers first. In this case, your second (questions) import needs to contain a single column of delimited answer GUIDs into a single column. Use the Feeds Tamper explode plugin on that field during mapping as explained (rather vaguely) here: http://drupal.stackexchange.com/questions/32234/how-to-use-feeds-module-to-import-multi-value-fields If that's not what you're trying to do, can you clarify? -Shaun Joshua Welker mailto:wel...@ucmo.edu June 16, 2014 at 2:35 PM Sorry, the last line got messed up by outlook. #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Joshua Welker [mailto:wel...@ucmo.edu] Sent: Monday, June 16, 2014 2:32 PM To: Code for Libraries Subject: RE: [CODE4LIB] Excel to XML (for a Drupal Feeds import) This should be quite doable in most programming languages with out-of-the-box tools and no tricky parsing code. The gist is to save in Excel as a delimited text file (tab is a good choice), then have your script ingest the document and turn it into an array, and then turn the array into XML. In Python, it could be something like the code below (not tested but the principles should be sound): import 'csv' from elementtree.ElementTree import Element, SubElement #create a list mylist = [] #open your delimited file with a csv reader with open('myfile.txt', 'rb') as textfile: reader = csv.reader( textfile, delimiter='\t', quotechat='') #this assumes your file is tab-delimited (\t) #loop through rows in your file and save each row as a key/value pair (dictionary) for row in textfile: fields = { 'field1': row[0] 'field2': row[1] 'field3': row[2] 'field4': row[3] } #append this row to our master list mylist.append( fields ) #create an xml root node rootNode = Element(XmlRoot) #loop through our list of rows from the text file and create xml nodes for row in mylist: rowNode = Element(record) #loop through all the fields on this row and turn them into xml nodes for fieldName, fieldValue in row: fieldNode = Element(fieldName) fieldNode.text = fieldValue #append each field node to the parent row node rowNode.append(fieldNode) #append each row node to the document root node rootNode.append(rowNode) #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Monday, June 16, 2014 1:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import) I'd just do this the old fashioned way. Awk is great for problems like this. For example, if your file is tab delimited, the following should work awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}'' yourfile In the example above, I just print the fields but you could easily encase them in tags. kyle Joshua Welker mailto:wel...@ucmo.edu June 16, 2014 at 2:32 PM This should be quite doable in most programming languages with out-of-the-box tools and no tricky parsing code. The gist is to save in Excel as a delimited text file (tab is a good choice), then have your script ingest the document and turn it into an array, and then turn the array into XML. In Python, it could be something like the code below (not tested but the principles should be sound): import 'csv' from elementtree.ElementTree import Element, SubElement #create a list mylist = [] #open your delimited file with a csv reader with open('myfile.txt', 'rb') as textfile: reader = csv.reader( textfile, delimiter='\t', quotechat='') #this assumes your file is tab-delimited (\t) #loop through rows in your file and save each row as a key/value pair (dictionary) for row in textfile: fields = { 'field1': row[0] 'field2': row[1] 'field3': row[2] 'field4': row[3] } #append this row to our master list mylist.append( fields ) #create an xml root node rootNode = Element(XmlRoot) #loop through our list of rows from the text file and create xml nodes for row in mylist: rowNode = Element(record) #loop through all the fields on this row and turn
Re: [CODE4LIB] Excel to XML
I'd do it with a Python script, but a Google search for csv to xml tool presents many hits. The top hit is a free online CSV to XML Converter at freeformatter.com. So export to CSV and away you go. http://www.freeformatter.com/csv-to-xml-converter.html Chris On 14-06-14 08:41 AM, Amelia Mowry wrote: You could also look into using an xsd. It's a pretty simple document that allows you to define the structure of your xml file and assign excel columns to fields. That way, when you save excel as xml, you can have the proper hierarchy in your xml. There are some quirks, but overall it's pretty effective. Amelia Mowry Metadata Discovery Services Librarian Wayne State University On Friday, June 13, 2014, Dana Pearson dbpearsonm...@gmail.com wrote: I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transformbibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana On Fri, Jun 13, 2014 at 6:53 PM, Terry Brady tw...@georgetown.edu wrote: The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote: Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly.-- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based on those results, I wasn't sure if there is more tweaking I could do in Excel, or if there is some programmed logic in Excel that can't be accounted for when associating a schema. Another suggestion I received was to fill the question column so that every row had a question listed. I did consider this, but the problem then is during the data import, I'd have to convince my CMS to put all the answers back together based on the question, something I'm sure Drupal COULD do, but I'm not sure how to do that either. Finally, this project is a spreadsheet with 225,270 rows, so you can imagine why I'd like a process that is reasonably trustworthy AND that can run locally. Anyway, any/all additional suggestions appreciated, even if they are try searching for blah blah python parser, or I made something that solves a similar process, and you can download it from Git. Ryan ___ Ryan Engel Web Stuff UW-Madison Dana Pearson mailto:dbpearsonm...@gmail.com June 13, 2014 at 7:14 PM I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transform bibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana Terry Brady mailto:tw...@georgetown.edu June 13, 2014 at 6:53 PM The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table Ryan Engel mailto:rten...@wisc.edu June 13, 2014 at 4:28 PM Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
I'd just do this the old fashioned way. Awk is great for problems like this. For example, if your file is tab delimited, the following should work awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}'' yourfile In the example above, I just print the fields but you could easily encase them in tags. kyle On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel rten...@wisc.edu wrote: Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based on those results, I wasn't sure if there is more tweaking I could do in Excel, or if there is some programmed logic in Excel that can't be accounted for when associating a schema. Another suggestion I received was to fill the question column so that every row had a question listed. I did consider this, but the problem then is during the data import, I'd have to convince my CMS to put all the answers back together based on the question, something I'm sure Drupal COULD do, but I'm not sure how to do that either. Finally, this project is a spreadsheet with 225,270 rows, so you can imagine why I'd like a process that is reasonably trustworthy AND that can run locally. Anyway, any/all additional suggestions appreciated, even if they are try searching for blah blah python parser, or I made something that solves a similar process, and you can download it from Git. Ryan ___ Ryan Engel Web Stuff UW-Madison Dana Pearson mailto:dbpearsonm...@gmail.com June 13, 2014 at 7:14 PM I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transform bibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana Terry Brady mailto:tw...@georgetown.edu June 13, 2014 at 6:53 PM The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table Ryan Engel mailto:rten...@wisc.edu June 13, 2014 at 4:28 PM Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
Rather than looking for strategies to convert directly from excel to XML, why don't you consider first moving to CSV and then converting that to XML? Once you've got CSV, then you could either write a script to do the conversion or use something like XSLT. Hope that helps, -Adam --- Adam Doan Research Enterprise and Scholarly Communication University of Guelph Library 519.824.4120 x52974 do...@uoguelph.ca - Original Message - From: Ryan Engel rten...@wisc.edu To: CODE4LIB@LISTSERV.ND.EDU Sent: Monday, June 16, 2014 12:29:53 PM Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import) Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based on those results, I wasn't sure if there is more tweaking I could do in Excel, or if there is some programmed logic in Excel that can't be accounted for when associating a schema. Another suggestion I received was to fill the question column so that every row had a question listed. I did consider this, but the problem then is during the data import, I'd have to convince my CMS to put all the answers back together based on the question, something I'm sure Drupal COULD do, but I'm not sure how to do that either. Finally, this project is a spreadsheet with 225,270 rows, so you can imagine why I'd like a process that is reasonably trustworthy AND that can run locally. Anyway, any/all additional suggestions appreciated, even if they are try searching for blah blah python parser, or I made something that solves a similar process, and you can download it from Git. Ryan ___ Ryan Engel Web Stuff UW-Madison Dana Pearson mailto:dbpearsonm...@gmail.com June 13, 2014 at 7:14 PM I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transform bibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana Terry Brady mailto:tw...@georgetown.edu June 13, 2014 at 6:53 PM The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table Ryan Engel mailto:rten...@wisc.edu June 13, 2014 at 4:28 PM Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
This should be quite doable in most programming languages with out-of-the-box tools and no tricky parsing code. The gist is to save in Excel as a delimited text file (tab is a good choice), then have your script ingest the document and turn it into an array, and then turn the array into XML. In Python, it could be something like the code below (not tested but the principles should be sound): import 'csv' from elementtree.ElementTree import Element, SubElement #create a list mylist = [] #open your delimited file with a csv reader with open('myfile.txt', 'rb') as textfile: reader = csv.reader( textfile, delimiter='\t', quotechat='') #this assumes your file is tab-delimited (\t) #loop through rows in your file and save each row as a key/value pair (dictionary) for row in textfile: fields = { 'field1': row[0] 'field2': row[1] 'field3': row[2] 'field4': row[3] } #append this row to our master list mylist.append( fields ) #create an xml root node rootNode = Element(XmlRoot) #loop through our list of rows from the text file and create xml nodes for row in mylist: rowNode = Element(record) #loop through all the fields on this row and turn them into xml nodes for fieldName, fieldValue in row: fieldNode = Element(fieldName) fieldNode.text = fieldValue #append each field node to the parent row node rowNode.append(fieldNode) #append each row node to the document root node rootNode.append(rowNode) #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Monday, June 16, 2014 1:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import) I'd just do this the old fashioned way. Awk is great for problems like this. For example, if your file is tab delimited, the following should work awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}'' yourfile In the example above, I just print the fields but you could easily encase them in tags. kyle On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel rten...@wisc.edu wrote: Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based on those results, I wasn't sure if there is more tweaking I could do in Excel, or if there is some programmed logic in Excel that can't be accounted for when associating a schema. Another suggestion I received was to fill the question column so that every row had a question listed. I did consider this, but the problem then is during the data import, I'd have to convince my CMS to put all the answers back together based on the question, something I'm sure Drupal COULD do, but I'm not sure how to do that either. Finally, this project is a spreadsheet with 225,270 rows, so you can imagine why I'd like a process that is reasonably trustworthy AND that can run locally. Anyway, any/all additional suggestions appreciated, even if they are try searching for blah blah python parser, or I made something that solves a similar process, and you can download it from Git. Ryan ___ Ryan Engel Web Stuff UW-Madison Dana Pearson mailto:dbpearsonm...@gmail.com June 13, 2014 at 7:14 PM I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transform bibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
Sorry, the last line got messed up by outlook. #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Joshua Welker [mailto:wel...@ucmo.edu] Sent: Monday, June 16, 2014 2:32 PM To: Code for Libraries Subject: RE: [CODE4LIB] Excel to XML (for a Drupal Feeds import) This should be quite doable in most programming languages with out-of-the-box tools and no tricky parsing code. The gist is to save in Excel as a delimited text file (tab is a good choice), then have your script ingest the document and turn it into an array, and then turn the array into XML. In Python, it could be something like the code below (not tested but the principles should be sound): import 'csv' from elementtree.ElementTree import Element, SubElement #create a list mylist = [] #open your delimited file with a csv reader with open('myfile.txt', 'rb') as textfile: reader = csv.reader( textfile, delimiter='\t', quotechat='') #this assumes your file is tab-delimited (\t) #loop through rows in your file and save each row as a key/value pair (dictionary) for row in textfile: fields = { 'field1': row[0] 'field2': row[1] 'field3': row[2] 'field4': row[3] } #append this row to our master list mylist.append( fields ) #create an xml root node rootNode = Element(XmlRoot) #loop through our list of rows from the text file and create xml nodes for row in mylist: rowNode = Element(record) #loop through all the fields on this row and turn them into xml nodes for fieldName, fieldValue in row: fieldNode = Element(fieldName) fieldNode.text = fieldValue #append each field node to the parent row node rowNode.append(fieldNode) #append each row node to the document root node rootNode.append(rowNode) #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Monday, June 16, 2014 1:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import) I'd just do this the old fashioned way. Awk is great for problems like this. For example, if your file is tab delimited, the following should work awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}'' yourfile In the example above, I just print the fields but you could easily encase them in tags. kyle On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel rten...@wisc.edu wrote: Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based on those results, I wasn't sure if there is more tweaking I could do in Excel, or if there is some programmed logic in Excel that can't be accounted for when associating a schema. Another suggestion I received was to fill the question column so that every row had a question listed. I did consider this, but the problem then is during the data import, I'd have to convince my CMS to put all the answers back together based on the question, something I'm sure Drupal COULD do, but I'm not sure how to do that either. Finally, this project is a spreadsheet with 225,270 rows, so you can imagine why I'd like a process that is reasonably trustworthy AND that can run locally. Anyway, any/all additional suggestions appreciated, even if they are try searching for blah blah python parser, or I made something that solves a similar process, and you can download it from Git. Ryan ___ Ryan Engel Web Stuff UW-Madison Dana Pearson mailto:dbpearsonm...@gmail.com June 13, 2014 at 7:14 PM I don't use Excel but a client did who wanted to use XSL I had created ONIX
Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import)
Ryan, it sounds like you simply want to pull two relational tables into drupal using referenced entities in a one (question) to many (answers) relationship? This can be accomplished, albeit unintuitively (it is Drupal afterall), by using the Feeds CSV parser with Feeds Tamper, and I would skip what seems like an unnecessary conversion to XML step. First, you import your questions and set a GUID target (typically your db id). Then import your answers -- each should have a reference to the id mapped using the Entity Reference by Feeds GUID. I have done it in reverse order too, in which case you'd import all your answers first. In this case, your second (questions) import needs to contain a single column of delimited answer GUIDs into a single column. Use the Feeds Tamper explode plugin on that field during mapping as explained (rather vaguely) here: http://drupal.stackexchange.com/questions/32234/how-to-use-feeds-module-to-import-multi-value-fields If that's not what you're trying to do, can you clarify? -Shaun On 6/16/14 3:35 PM, Joshua Welker wrote: Sorry, the last line got messed up by outlook. #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Joshua Welker [mailto:wel...@ucmo.edu] Sent: Monday, June 16, 2014 2:32 PM To: Code for Libraries Subject: RE: [CODE4LIB] Excel to XML (for a Drupal Feeds import) This should be quite doable in most programming languages with out-of-the-box tools and no tricky parsing code. The gist is to save in Excel as a delimited text file (tab is a good choice), then have your script ingest the document and turn it into an array, and then turn the array into XML. In Python, it could be something like the code below (not tested but the principles should be sound): import 'csv' from elementtree.ElementTree import Element, SubElement #create a list mylist = [] #open your delimited file with a csv reader with open('myfile.txt', 'rb') as textfile: reader = csv.reader( textfile, delimiter='\t', quotechat='') #this assumes your file is tab-delimited (\t) #loop through rows in your file and save each row as a key/value pair (dictionary) for row in textfile: fields = { 'field1': row[0] 'field2': row[1] 'field3': row[2] 'field4': row[3] } #append this row to our master list mylist.append( fields ) #create an xml root node rootNode = Element(XmlRoot) #loop through our list of rows from the text file and create xml nodes for row in mylist: rowNode = Element(record) #loop through all the fields on this row and turn them into xml nodes for fieldName, fieldValue in row: fieldNode = Element(fieldName) fieldNode.text = fieldValue #append each field node to the parent row node rowNode.append(fieldNode) #append each row node to the document root node rootNode.append(rowNode) #now save the whole thing as an xml file with open('myfile.xml', 'wb') as file ElementTree(rootNode).write(file) Josh Welker -Original Message- From: Code for Libraries [mailto:CODE4LIB@LISTSERV.ND.EDU] On Behalf Of Kyle Banerjee Sent: Monday, June 16, 2014 1:04 PM To: CODE4LIB@LISTSERV.ND.EDU Subject: Re: [CODE4LIB] Excel to XML (for a Drupal Feeds import) I'd just do this the old fashioned way. Awk is great for problems like this. For example, if your file is tab delimited, the following should work awk '{FS=\t}{if ($2 != ) question = $2;}{print $1,question,$3}'' yourfile In the example above, I just print the fields but you could easily encase them in tags. kyle On Mon, Jun 16, 2014 at 9:29 AM, Ryan Engel rten...@wisc.edu wrote: Thanks for the responses, on the list and off, so far. As I'm sure is true for so many of us, my interest in learning more about how to solve this type of problem is balanced against my need to just get the project done so I can move on to other things. One of the great things about this list is the ability to learn from the collective experiences of colleagues. For this project specifically, even clues about better search terms is useful; as Chris Gray pointed out, basic Google searches present too many hits. I did try following the Create an XML data file and XML schema file from worksheet data instructions on the Microsoft site. And it did produce an XML document, but it wasn't able to transform this: Row1Question1Q1Answer1 Row2Q1Answer2 ...into something like this: row1Row One Data/row1 question1This is a question/question1 answers q1answer1Answer 1/q1answer1 q1answer2Answer2/q1answer2 /answers Instead, I could get it to either convert every row into its own XML entry, meaning that I had a lot of answers with no associated questions, or I got an XML file that had 1 question with EVERY SINGLE answer nested beneath it -- effectively all questions after the first question were ignored. Based
Re: [CODE4LIB] Excel to XML
You could also look into using an xsd. It's a pretty simple document that allows you to define the structure of your xml file and assign excel columns to fields. That way, when you save excel as xml, you can have the proper hierarchy in your xml. There are some quirks, but overall it's pretty effective. Amelia Mowry Metadata Discovery Services Librarian Wayne State University On Friday, June 13, 2014, Dana Pearson dbpearsonm...@gmail.com wrote: I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transformbibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana On Fri, Jun 13, 2014 at 6:53 PM, Terry Brady tw...@georgetown.edu wrote: The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote: Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly.-- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries
[CODE4LIB] Excel to XML
Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly. In the interest immanent deadlines, I have handed the spreadsheet off to my Perl-writing colleague. But as a professional growth opportunity, I'm interested in suggestions from Libraryland about ways others have approached this successfully. Thanks! Ryan Engel Web Stuff UW-Madison smime.p7s Description: S/MIME Cryptographic Signature
Re: [CODE4LIB] Excel to XML
The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote: Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly. In the interest immanent deadlines, I have handed the spreadsheet off to my Perl-writing colleague. But as a professional growth opportunity, I'm interested in suggestions from Libraryland about ways others have approached this successfully. Thanks! Ryan Engel Web Stuff UW-Madison -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498
Re: [CODE4LIB] Excel to XML
I don't use Excel but a client did who wanted to use XSL I had created ONIX to MARC to transformbibliographic metadata in Excel to XML. The built in Excel XML converter was not very helpful since empty cells were skipped so that it was impossible to use that result. There is an add on that allow you to map your data to XML elements by creating a schema which is pretty cool. http://bit.ly/1jpwtqM This might be helpful. regards, dana On Fri, Jun 13, 2014 at 6:53 PM, Terry Brady tw...@georgetown.edu wrote: The current version of Excel offers a save as XML option. It will produce something like this. There is other wrapping metadata, but the table is pretty easy to parse. Table ss:ExpandedColumnCount=3 ss:ExpandedRowCount=7 x:FullColumns=1 x:FullRows=1 ss:DefaultRowHeight=15 Row Cell ss:StyleID=s62Data ss:Type=Stringrow 1/Data/Cell CellData ss:Type=Stringquestion 1/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 2/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 3/Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 4/Data/Cell CellData ss:Type=Stringquestion 2/Data/Cell CellData ss:Type=Stringanswer 1/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 5 /Data/Cell Cell ss:Index=3Data ss:Type=Stringanswer 2/Data/Cell /Row Row Cell ss:StyleID=s62Data ss:Type=Stringrow 6/Data/Cell CellData ss:Type=Stringquest /Data/Cell CellData ss:Type=Stringanswer 3/Data/Cell /Row Row Cell ss:StyleID=s62/ /Row /Table On Fri, Jun 13, 2014 at 2:28 PM, Ryan Engel rten...@wisc.edu wrote: Hello - I have an Excel spreadsheet that, for the purposes of an easy import into a Drupal site, I'd like to convert to XML. I know people more knowledgeable than I could code up something in Python or Perl to convert a CSV version of the data to XML (and I have a colleague who offered to do just that for me), but I am looking for recommendations for something more immediately accessible. Here's an idea of how the spreadsheet is structured: Row1Question1Q1Answer1 Row2Q1Answer2 Row3Q1Answer3 Row4Question2Q2Answer1 Row5Q2Answer2 Row6Question3Q3Answer1 etc. How do other people approach this? Import the data to an SQL database, write some clever queries, and then export that to XML? Work some wizardry in GoogleRefine/OpenRefine? Are scripting languages really the best all around solution? Excel's built in XML mapping function wasn't able to process the one-to-many relationship of questions to answers, though maybe I just don't know how to build the mapping structure correctly. In the interest immanent deadlines, I have handed the spreadsheet off to my Perl-writing colleague. But as a professional growth opportunity, I'm interested in suggestions from Libraryland about ways others have approached this successfully. Thanks! Ryan Engel Web Stuff UW-Madison -- Terry Brady Applications Programmer Analyst Georgetown University Library Information Technology https://www.library.georgetown.edu/lit/code 425-298-5498 -- Dana Pearson dbpearsonmlis.com Metadata and Bibliographic Services for Libraries